Posted in Books: Tuesday, January 17, 2017
A First Course in Information Theory, by: Raymond W. Yeung
Out of the sixteen chapters in this book, the first thirteen chapters are basic topics, while the last three chapters are advanced topics for the more enthusiastic reader. A brief rundown of the chapters will give a better idea of what is in this book.
Chapter 1 is a very high level introduction to the nature of information theory and the main results in Shannon’s original paper in 1948 which founded the field. There are also pointers to Shannon’s biographies and his works.
Chapter 2 introduces Shannon’s information measures and their basic properties. Useful identities and inequalities in information theory are derived and explained. Extra care istaken in handling joint distributions with zero probability masses. The chapter ends with a section on the entropy rate of a stationary information source.
Chapter 3 is a discussion of zero-error data compression by uniquely decodable codes, with prefix codes as a special case. A proof of the entropy bound for prefix codes which involves neither the Kraft inequality nor the fundamental inequality is given. This proof facilitates the discussion of the redundancy of prefix codes.
Chapter 4 is a thorough treatment of weak typicality. The weak asymptotic equipartition property and the source coding theorem are discussed. An explanation of the fact that a good data compression scheme produces almost i.i.d. bits is given. There is also a brief discussion of the Shannon-McMillanBreiman theorem.
Chapter 5 introduces a new definition of strong typicality which does not involve the cardinalities of the alphabet sets. The treatment of strong typicality here is more detailed than Berger  but less abstract than Csisz ✕✖ r and K ✗✘ rner . A new exponential convergence result is proved in Theorem 5.3.
Chapter 6 is an introduction to the theory of -Measure which establishes a one-to-one correspondence between Shannon’s information measures and set theory. A number of examples are given to show how the use of information diagrams can simplify the proofs of many results in information theory. Most of these examples are previously unpublished. In particular, Example 6.15 is a generalization of Shannon’s perfect secrecy theorem.
Chapter 7 explores the structure of the -Measure for Markov structures. Set-theoretic characterizations of full conditional independence and Markov random field are discussed. The treatment of Markov random field here is perhaps too specialized for the average reader, but the structure of the -Measure and the simplicity of the information diagram for a Markov chain is best explained as a special case of a Markov random field.
Chapter 8 consists of a new treatment of the channel coding theorem. Specifically, a graphical model approach is employed to explain the conditional independence of random variables. Great care is taken in discussing feedback.
Chapter 9 is an introduction to rate distortion theory. The results in this chapter are stronger than those in a standard treatment of the subject although essentially the same techniques are used in the derivations.
In Chapter 10, the Blahut-Arimoto algorithms for computing channel capacity and the rate distortion function are discussed, and a simplified proof for convergence is given. Great care is taken in handling distributions with zero probability masses.
Chapter 11 is an introduction to network coding theory. The surprising fact that coding at the intermediate nodes can improve the throughput when an information source is multicast in a point-to-point network is explained. The max-flow bound for network coding with a single information source is explained in detail. Multi-source network coding will be discussed in Chapter 15 after the necessary tools are developed in the next three chapters.
Information inequalities are sometimes called the laws of information theory because they govern the impossibilities in information theory. In Chapter 12, the geometrical meaning of information inequalities and the relation between information inequalities and conditional independence are explained in depth. The framework for information inequalities discussed here is the basis of the next two chapters.
Chapter 13 explains how the problem of proving information inequalities can be formulated as a linear programming problem. This leads to a complete characterization of all information inequalities which can be proved by conventional techniques. These are called Shannon-type inequalities, which can now be proved by the software ITIP which comes with this book. It is also shown how Shannon-type inequalities can be used to tackle the implication problem of conditional independence in probability theory.
All information inequalities we used to know were Shannon-type inequalities. Recently, a few non-Shannon-type inequalities have been discovered. This means that there exist laws in information theory beyond those laid down by Shannon. These inequalities and their applications are explained in depth in Chapter 14.
Network coding theory is further developed in Chapter 15. The situation when more than one information source are multicast in a point-to-point network is discussed. The surprising fact that a multi-source problem is not equivalent to a few single-source problems even when the information sources are mutually independent is clearly explained. Implicit and explicit bounds on the achievable coding rate region are discussed. These characterizations on the achievable coding rate region involve almost all the tools that have been developed earlier in the book, in particular, the framework for information inequalities.
Chapter 16 explains an intriguing relation between information theory and group theory. Specifically, for every information inequality satisfied by any joint distribution, there is a corresponding group inequality satisfied by any finite group and its subgroups, and vice versa. Inequalities of the latter type govern the orders of any finite group and their subgroups. Group-theoretic proofs of Shannon-type information inequalities are given. At the end of this chapter, a group inequality is obtained from a non-Shannon-type inequality discussed in Chapter 14. The meaning and the implication of this inequality are yet to be understood.