One of the research objective of our NSF STC is to study information embodied in structures. Traditionally, information theory deals with "conventional data," be it textual data, image, or video data.
However, databases of various sorts have come into existence in recent years for storing "unconventional data'' including biological data, social data, web data, topographical maps, and medical data.
In compressing such data, one must consider two types of information:
The information conveyed by the structure itself, and the information conveyed by the data labels implanted in the structure.
We attempt to address the former problem by studying information of graphical structures (i.e., unlabeled graphs). In the first part of this talk, I provide some information-theoretic background leading to a proof of the first Shannon Theorem concerning the fundamental limits for lossless compression.
Then we present a fundamental lower bound for structural (graph) compression, and propose a two-stage compression algorithm that asymptotically achieves the structural entropy up to the first two leading terms.