Repository logo

Entropy in Data Compression, Additive Combinatorics and Probability



Change log


Gavalakis, Lampros 


This thesis has four parts:

In the first part, the problem of lossless data compression with side information available to both the encoder and the decoder is considered. The finite-blocklength fundamental limits of the best achievable performance are defined, in two different versions of the problem: Reference-based compression, when a single side information string is used repeatedly in compressing different source messages, and pair-based compression, where a different side information string is used for each source message. General achievability and converse theorems are established for arbitrary source-side information pairs. Nonasymptotic normal approximation expansions are proved for the optimal rate in both the reference-based and pair-based settings, for memoryless sources. These are stated in terms of explicit, finite-blocklength bounds, that are tight up to third-order terms. Extensions that go significantly beyond the class of memoryless sources are obtained. The relevant source dispersion is identified and its relationship with the conditional varentropy rate is established. Interestingly, the dispersion is different in reference-based and pair-based compression, and it is proved that the reference-based dispersion is in general smaller.

Secondly, a discrete, asymptotic analogue of the entropy power inequality due to Tao is discussed. The nonasymptotic version of the result is obtained. Applications in information theory and connections with the entropic central limit theorem are discussed.

In the third part, we recall some of the history of the information-theoretic approach to deriving core results in probability theory and give three new such proofs. A strengthened version of the central limit theorem for discrete random variables is established, relying only on information-theoretic tools and elementary arguments. It is shown that the relative entropy between the standardised sum of n independent and identically distributed lattice random variables and an appropriately discretised Gaussian, vanishes as n. Then we give two new information-theoretic proofs of finite versions of de Finetti's classical representation theorem for finite-valued random variables. We derive an upper bound on the relative entropy between the distribution of the first k in a sequence of n exchangeable random variables, and an appropriate mixture over product distributions. The first result is stated for binary random variables and the proof is based on estimating the dependence of the random variables under an appropriate conditioning. The second result is stated for random variables in a general finite alphabet. Its proof is nicely motivated by the Gibbs conditioning principle in connection with statistical mechanics, and it follows along an appealing sequence of steps. The technical estimates required for these steps are obtained via the the method of types. The mixing measure is characterised in both cases as the law of the empirical measure of the original sequence, and de Finetti's result is recovered as a corollary.

In the final part, we provide a collection of inequalities for the differential entropy of an arbitrary random variable convolved with a Gaussian. In terms of techniques, as well as of the results themselves, these inequalities lie in the interface between sumset inequalities for entropy and the central limit theorem.





Godsill, Simon


Entropy, Data Compression, Additive Combinatorics, Probability, Entropy Power Inequality, Central Limit Theorem, De Finetti's Theorem, Exchangeability


Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge
EPSRC (2107993)
EPSRC, grant number RG94782 and Cambridge Trust.