Posted in Posters: Thursday, December 1, 2011
It has long been known that only a limited number of amino acid sequences and tertiary protein structures are realized, and their frequency distributions are highly skewed. We investigated the relationship between sequence and structure through the lens of information theory to determine a possible explanation of this phenomenon. We hypothesized that the dense mapping from sequence to structure is a result of an efficient channel coding process. By exhaustively enumerating all 16-node compact
structures/sequences, we constructed a table of structure probabilities conditioned on sequences, which was viewed as a noisy channel. The capacity of this channel was computed, and the optimizing sequence distribution was compared with the observed distribution of sequences and structures in nature. These distributions were found to be highly skewed and similar in shape to one another. These results lend credibility to the conclusion that the dense sequence-structure mapping arises as an efficient encoding of proteins necessary to sustain life.
Part of NSF Site Visit December 1, 2011