Prashant Pandey

Google Scholar
GitHub
Emailppandey2 AT cs.cmu.edu
[Recent News] [Publications] [Talks]
[Research Statement] [CV]
[Dissertation]

Research Interests

My research interests lie at the intersection of Systems and Algorithms. I design and build theoretically well-founded data structures for big data problems in computational biology, streaming, and file systems.

My current research focuses on building efficient filter data structures for approximate membership testing and counting. I also work on building write-optimized data structures for online event-detection problem in streaming data sets. I am also a member of the team that developed BetrFS, an in-kernel file system built on write-optimized indexes.

Currently, I am a Postdoctoral Scholar at the School of Computer Science at Carnegie Mellon University. I work with Prof. Carl Kingsford on building compact data structures for large-scale sequence-search and representation of de Bruijn graphs for DNA sequencing and transcriptomic analysis.

Previously, I obtained my Ph.D. in Computer Science at Stony Brook University, and defended my dissertation, Fast and Space-Efficient Maps: Shrinking Big Data Down to Size. At Stony brook University, I was co-advised by Prof. Michael Bender and Prof. Rob Johnson. (Dissertation committee: Mike Ferdman, Rob Patro, Guy Blelloch.)

Recent News

  1. Finding a Needle in a Field of Haystacks: Cell Systems publishes research on Mantis, a new sequencing search tool.   [link]

  2. I recently received the Catacosinos fellowship for excellence in computer science at Stony Brook University.   [link]

  3. Our counting quotient filer paper is one of eight ACM SIGMOD 2017 Reproducible Papers.   [link]

  4. Our computational biology research got mentioned on VMware Research blog.   [link]

  5. The counting quotient filter data structure featured on the morning paper.   [link]

Publications

In reverse chronological order:

  1. Small Refinements to the DAM Can Have Big Consequences for Data-Structure Design (SPAA 2019)
    Michael Bender, Alex Conway, Martin Farach-Colton, William Jannen, Yizheng Jiao, Rob Johnson, Eric Knorr, Sara McAllister, Nirjhar Mukherjee, Prashant Pandey, Donald E. Porter, Jun Yuan, Yang Zhan

  2. Locality Sensitive Hashing for the Edit Distance (ISMB 2019)
    Guillaume Marcais, Dan DeBlasio, Prashant Pandey, Carl Kingsford

  3. The Online Event-Detection Problem (arXiv 2019) [Submitted]
    Michael A. Bender, Jonathan W. Berry, Martin Farach-Colton, Rob Johnson, Thomas M. Kroeger, Prashant Pandey, Cynthia A. Phillips, Shikha Singh

  4. An Efficient, Scalable and Exact Representation of High-Dimensional Color Information Enabled via de Bruijn Graph Search Problem (RECOMB 2019)
    Fatemeh Almodaresi, Prashant Pandey, Michael Ferdman, Rob Johnson, Rob Patro

  5. Buffered Count-Min Sketch on SSD: Theory and Experiments (ESA 2018)
    Mayank Goswami, Dzejla Medjedovic, Emina Mekic, Prashant Pandey

  6. Mantis: A Fast, Small, and Exact Large-Scale Sequence-Search Index (RECOMB 2018) (Cell Systems 2018)
    Prashant Pandey, Fatemeh Almodaresi, Michael A. Bender, Michael Ferdman, Rob Johnson, and Rob Patro

  7. Rainbowfish: A Succinct Colored de Bruijn Graph Representation (WABI 2017) [biorxiv]
    Fatemeh Almodaresi, Prashant Pandey, and Rob Patro

  8. deBGR: An Efficient and Near-Exact Representation of the Weighted de Bruijn Graph (ISMB 2017) (Bioinformatics 2017)
    Prashant Pandey, Michael A. Bender, Rob Johnson, and Rob Patro

  9. Squeakr: An Exact and Approximate k-mer Counting System (Bioinformatics 2017) [biorxiv]
    Prashant Pandey, Michael A. Bender, Rob Johnson, and Rob Patro

  10. A General-Purpose Counting Filter: Making Every Bit Count (SIGMOD 2017)
    Prashant Pandey, Michael A. Bender, Rob Johnson, and Rob Patro

  11. A Fast x86 Implementation of Select (arXiv 2017)
    Prashant Pandey, Michael A. Bender, and Rob Johnson

  12. Writes Wrought Right, and Other Adventures in File System Optimization (ACM Transactions on Storage (TOS) - Special Issue USENIX FAST 2016
    Jun Yuan, Yang Zhan, William Jannen, Prashant Pandey, Amogh Akshintala, Kanchan Chandnani, Pooja Deo, Zardosht Kasheff, Michael Bender, Martin Farach-Colton, Rob Johnson, Bradley C. Kuszmaul, and Donald E. Porter

  13. Optimizing Every Operation in a Write-optimized File System (FAST 2016) [Awarded Best Paper]
    Jun Yuan, Yang Zhan, William Jannen, Prashant Pandey, Amogh Akshintala, Kanchan Chandnani, Pooja Deo, Zardosht Kasheff, Michael Bender, Martin Farach-Colton, Rob Johnson, Bradley C. Kuszmaul, and Donald E. Porter

  14. BetrFS: Write-Optimization in a Kernel File System (ACM Transactions on Storage (TOS) - Special Issue USENIX FAST 2015)
    William Jannen, Jun Yuan, Yang Zhan, Amogh Akshintala, John Esmet, Yizheng Jiao, Ankur Mittal, Prashant Pandey, Phaneendra Reddy, Leif Walsh, Michael A. Bender, Martin Farach-Colton, Rob Johnson, Bradley C. Kuszmaul, and Donald E. Porter

  15. BetrFS: A Right-Optimized Write-Optimized File System (FAST 2015) [Runner up for best paper]
    William Jannen, Jun Yuan, Yang Zhan, Amogh Akshintala, John Esmet, Yizheng Jiao, Ankur Mittal, Prashant Pandey, Phaneendra Reddy, Leif Walsh, Michael A. Bender, Martin Farach-Colton, Rob Johnson, Bradley C. Kuszmaul, and Donald E. Porter

Talks

  1. Fast and Space-Efficient Maps: Shrinking Big Data Down to Size   [pdf]
    Venue: Dissertation defense, Stony Brook University, NY [October 2018]

  2. Buffered Count-Min Sketch on SSD: Theory and Experiments   [pdf]
    Venue: ESA 2018, Helsinki, Finland [August 2018]

  3. Fast and Space-Efficient Maps: Shrinking Big Data Down to Size   [pdf]
    Venue: Proposal defense, Stony Brook University, NY [June 2018]

  4. Mantis: A Fast, Small, and Exact Large-Scale Sequence-Search Index   [pdf]
    Venue: RECOMB 2018, Paris, France [April 2018]

  5. Scheduling Problems in Write-Optimized Key-Value Stores   [pdf]
    Venue: New Challenges in Scheduling Theory 2018, Aussois, France [April 2018]

  6. Compact Representation of Annotated de Bruijn Graphs   [pdf]
    Venue: Berkeley Lab, Berkeley, CA [January 2018]

  7. deBGR: An Efficient and Near-Exact Representation of the Weighted de Bruijn Graph [Extended talk]   [pdf] [Talk]
    Venue: VMware Research, Palo Alto, CA [August 2017] and                         Google Research, NY [September 2017]

  8. deBGR: An Efficient and Near-Exact Representation of the Weighted de Bruijn Graph   [pdf] [Talk]
    Venue: ISMB 2017, Prague, Czech Republic [July 2017]

  9. A General-Purpose Counting Filter: Making Every Bit Count   [pdf] [Talk]
    Venue: SIGMOD 2017, Chicago, IL [May 2017]

  10. Intel Software Guard Extensions (SGX)   [pdf]
    Venue: Sandia National Laboratories, Livermore, CA [August 2015]