Spring 2025: CS 7280 Data Structures & Algorithms for Scalable Computing

Lectures: MoWe / 02:50PM-04:30PM ET at WVH 366

Instructor: Prashant Pandey

  • Email: p.pandey [at] northeastern.edu
  • Office Hours: MoWe / 01:30PM-02:30PM ET

Teaching Assistant: TBD

  • Email:
  • Office Hours:

We will use Piazza for all Q&A: TBD

Course Schedule

Course Overview

This course studies advanced data structures and algorithms for handling scalability challenges in large-scale data management and machine learning pipelines. It will cover modern hashing techniques, filters and sketching algorithms, locality-sensitive hashing, succinct data structures, string algorithms, graph algorithms, external memory algorithms, and ML-based learned indexes. This course is appropriate for both undergraduate and graduate students with intermediate data structure and algorithm skills. The course will also require intermediate programming skills in C/C++.

Prerequisites

  • For grads: CS 5800 (Grad algorithms)
  • For undergrads: CS 3000 (Algorithms and Data)

Please email me if you’re interested in the course but don’t meet the prerequisites.

PhD program: The course fulfills the PhD breath requirements for the “Artificial Intelligence and Data Science” and “Systems and Security " breath area.

Course Topics

  • Compact trees
  • Succinct data structures
  • String algorithms
  • Hashing techniques
  • Filters and sketches
  • Locality sensitive hashing (LSH)
  • (Approximate) Nearest neighbor search
  • Graph algorithms
  • External memory algorithms
  • Distributed data structures

Assignments

  • Assignments: There will be two assignments, including theoretical problems and/or programming tasks, completed individually.

Projects

  • Final project: The main component of the grade is the final group project, where students will:
    • Work in groups of 2-3 (exceptions possible),
    • Implement a project relevant to course material,
    • Ensure the project is unique and requires a significant theoretical or programming effort from all team members. The instructor will provide topic suggestions if needed.

Paper Reading

A set of assigned paper readings is included in the course. Each student is required to pick five papers and submit a one-paragraph synopsis for each. There will be five deadlines throughout the semester. Late submissions will not be accepted without prior approval.

Each review must include:

  • What is the problem and why is it hard? (Three sentences)
  • An overview of the main idea and contributions (Three sentences)
  • How the authors evaluate their solution (Two sentences)

Scribing

  • Use this template for scribing.
  • Each student may scribe 1-2 lectures depending on class size.
  • Send your choice to scribe to the TA on a first-come, first-served basis.
  • Submit scribe notes (pdf + source) to the TA.
  • Scribe notes are due by 9pm the day after the lecture.

Useful Resources

Grading

  • Assignments: 20%
  • Final Project: 40%
  • Paper Reports: 10%
  • Class participation: 20%
  • Final Exam: 10%

Late Submission Policy

  • No late submissions are allowed. Please plan accordingly.
  • For emergencies, prior permission from the instructor is required.

Collaboration and Plagiarism

  • Review the Northeastern University Policy on Academic Misconduct.
  • You may discuss assignment approaches with others but avoid sharing or copying solutions.
  • Declare any collaboration upfront.
  • Do not share your code or make it public.
  • Using tools like Github Copilot, ChatGPT, or copying code from websites constitutes cheating. Cheating will result in an E in the course and referral to the University Student Behavior Committee.

Note: Any attempt to subvert the grading process is considered cheating. If in doubt, ask.

Schedule

  • This course schedule is subject to change through the semester.

  • Lecture notes/scribe notes will be uploaded here after each lecture.

  • Lectures will not be recorded.

Date
Topic
Reading List
Scribe Notes
Deadlines
Jan 06 Course Introduction and Logistics
Jan 08 Van Emde Boas Tree Paper 1 Paper 2 Paper 3 Instructor Lecture notes Scribe:
Jan 13 X-fast tree and Y-fast tree Paper 1 Paper 2 Assignment 1 release
Jan 15 Succinct binary tries (Rank & Select) Paper 1 Paper 2 Instructor Lecture notes Paper report 1 Due
Jan 22 Tries, Suffix Tree, Suffix Array Paper 1 Paper 2 Paper 3 Paper 4 Instructor Lecture notes
Jan 27 BWT & FM-index Paper 1 Paper 2 Paper 3 Instructor Lecture notes
Jan 29 Hashing (Balls and Bins) Instructor Lecture notes Assignment 1 due
Feb 03 Hashing (Hash functions/Chaining/Perfect hashing) Paper 1 Paper 2 Paper 3 Paper 4 Instructor Lecture notes Assignment 2 release
Feb 05 Hash tables (Linear prob./Cuckoo/Two-choice/Iceberg) Paper 1 Paper 2 Paper 3 Paper 4 Paper report 2 Due
Feb 10 Filters (Bloom/Quotient/Cuckoo) Paper 1 Paper 2 Paper 3 Paper 4 Instructor Lecture notes
Feb 12 Adaptive filters Paper 1 Paper 2
Feb 19 Range filters Paper 1 Paper 2 Paper 3
Feb 24 Misra-Gries, Count Sketch, Count-Min Sketch Paper 1 Paper 2 Paper 3 Paper 4 Instructor Lecture notes
Feb 26 Locality Sensitive Hashing Paper 1 Paper 2 Paper 3
Mar 10 Nearest Neighbor Search Paper 1 Paper 2 Assignment 2 due
Mar 12 Approximate Nearest Neighbor Search Paper 1 Paper 2 Paper report 3 Due
Mar 17 Min Hashing & Similarity Search Paper 1 Paper 2 Paper 3 Paper 4 Instructor Lecture notes Final project proposal Due
Mar 24 Cardinality Estimation (HyperLogLog) Paper 1 Paper 2 Paper 3 Instructor Lecture notes
Mar 26 Graphs Representation & Computations Paper 1 Paper 2 Instructor Lecture notes Paper report 4 Due
Mar 31 Streaming Graphs & Incremental Computations Paper 1 Paper 2 Instructor Lecture notes
Apr 02 Comp Bio Graphs Paper 1 Paper 2 Paper 3 Paper 4 Instructor Lecture notes
Apr 07 High-Dimensional Vector Compression Paper 1 Lecture notes Final project mid-point report Due
Apr 09 Distributed Hash Tables Paper 1 Instructor Lecture notes
Apr 14 Consistent Hashing Theory Paper 1 Instructor Lecture notes Paper report 5 Due
Apr 16
Apr 23 Final Project Presentations
Prashant Pandey
Prashant Pandey
Assistant Professor

My research interests lie at the intersection of Systems and Algorithms. I design and build theoretically well-founded data structures for big data problems in computational biology, streaming, and file systems.