I will talk about compression when the goal is to concisely represent sequences in a way that allows to determine whether they are similar to a given query sequence. The problem arises in an increasing variety of contexts from genomics to forensics to internet search. We characterize achievable tradeoffs between conciseness of the representation, similarity level, and reliability of the answers to the queries, with particular focus on the practically motivated case where false negatives (misdetections) are not allowed. Guided by the theoretical findings, we construct and experiment with practical compressors for queries. The performance of these schemes is seen to approach the theoretical limits on simulated and real data. Connections to existing literature and future directions will be discussed.
Based on collaborations with Thomas Courtade, Amir Ingber, Idoia Ochoa and Golan Yona.