Data Compression for Similarity Queries
Overview
Searching a large database in terms of "I need all sequences x, that are similar to y" can present both efficiency and accuracy challenges. This advanced topic is an introduction to research led by Professor Tsachy Weissman and his lab in recent years at Stanford and supported by Center for Science of Information regarding an understanding of how to compress such a large database down to a more modest one, and then conduct similarity queries from this compressed data both efficiently and accurately. The video presentation provided here by Dr. Weissman was part of the "New Directions in the Science of Information" Meeting at UC Berkeley, CA in November 2013.
Related Resources
Several conference and journal papers published on this work are provided as links for additional details. Students and faculty interested in this topic are also encouraged to view the series of seminars from the Science of Information Day and Stanford Compression Forum that were presented February 18-19, 2016.
Papers:
Compression Schemes for Similarity Queries (PDF)
Efficient Similarity Queries via Lossy Compression (PDF)
Compression for Quadratic Similarity Queries: Finite Block-length and Practical Schemes (PDF)