Skip to main content

Data Compression for Similarity Queries



Searching a large database in terms of "I need all sequences x, that are similar to y" can present both efficiency and accuracy challenges. This advanced topic is an introduction to research led by Professor Tsachy Weissman and his lab in recent years at Stanford and supported by Center for Science of Information regarding an understanding of how to compress such a large database down to a more modest one, and then conduct similarity queries from this compressed data both efficiently and accurately. The video presentation provided here by Dr. Weissman was part of the "New Directions in the Science of Information" Meeting at UC Berkeley, CA in November 2013.


Related Resources

Several conference and journal papers published on this work are provided as links for additional details. Students and faculty interested in this topic are also encouraged to view the series of seminars from the Science of Information Day and Stanford Compression Forum that were presented February 18-19, 2016.

Compression Schemes for Similarity Queries (PDF)

Efficient Similarity Queries via Lossy Compression (PDF)

Compression for Quadratic Similarity Queries: Finite Block-length and Practical Schemes (PDF)


Module Author

Syllabus/Suggested Schedule

To view any lecture, just click on them and view it in the player above


Week 1 : Weissman - Compression for Similarity Inquiries

  • Michael W. Mahoney - "From Local Data to Global Knowledge with Locally-biased Graph Algorithms"
  • Ayfer Özgür - "Remotely Powered Communications "
    Recorded: February 18
  • Thomas Courtade - "Assembly Problem: Insights and Algorithms from Information"
    Recorded: February 18
  • David Tse - "From Haplotype Phasing to Community Recovery for Graphs with Locality"
    Recorded: February 18
  • Venkat Anantharam - "A Relative Entropy Characterization of the Growth Rate of Reward"
    Recorded: February 18
  • Debargha Mukherjee - "An Overview of New Video Coding Tools in VP10"
    Recorded: February 19
  • Euan Ashley - "Precision Medicine"
    Recorded: February 19
  • John Altschuler - "Silicon Valley"
    Recorded: February 19
  • Dan Boneh - "Encryption and Compression: Can They Get Along?"
    Recorded: February 19
  • Andrea Goldsmith - "Fundamental Performance Limits of Analog to Digital Compression"
    Recorded: February 19
  • Stanford Compression Forum - Panel on Big Data Compression
    Recorded: February 19