Spring 2025: CS 7280 Data Structures & Algorithms for Scalable Computing
Lectures: MoWe / 02:50PM-04:30PM ET at WVH 366
Instructor: Prashant Pandey
- Email: p.pandey [at] northeastern.edu
- Office Hours: MoWe / 01:30PM-02:30PM ET
Teaching Assistant: TBD
- Email:
- Office Hours:
We will use Piazza for all Q&A: TBD
Course Overview
This course studies advanced data structures and algorithms for handling scalability challenges in large-scale data management and machine learning pipelines. It will cover modern hashing techniques, filters and sketching algorithms, locality-sensitive hashing, succinct data structures, string algorithms, graph algorithms, external memory algorithms, and ML-based learned indexes. This course is appropriate for both undergraduate and graduate students with intermediate data structure and algorithm skills. The course will also require intermediate programming skills in C/C++.
Prerequisites
- For grads: CS 5800 (Grad algorithms)
- For undergrads: CS 3000 (Algorithms and Data)
Please email me if you’re interested in the course but don’t meet the prerequisites.
PhD program: The course fulfills the PhD breath requirements for the “Artificial Intelligence and Data Science” and “Systems and Security " breath area.
Course Topics
- Compact trees
- Succinct data structures
- String algorithms
- Hashing techniques
- Filters and sketches
- Locality sensitive hashing (LSH)
- (Approximate) Nearest neighbor search
- Graph algorithms
- External memory algorithms
- Distributed data structures
Assignments
- Assignments: There will be two assignments, including theoretical problems and/or programming tasks, completed individually.
Projects
- Final project: The main component of the grade is the final group project, where students will:
- Work in groups of 2-3 (exceptions possible),
- Implement a project relevant to course material,
- Ensure the project is unique and requires a significant theoretical or programming effort from all team members. The instructor will provide topic suggestions if needed.
Paper Reading
A set of assigned paper readings is included in the course. Each student is required to pick five papers and submit a one-paragraph synopsis for each. There will be five deadlines throughout the semester. Late submissions will not be accepted without prior approval.
Each review must include:
- What is the problem and why is it hard? (Three sentences)
- An overview of the main idea and contributions (Three sentences)
- How the authors evaluate their solution (Two sentences)
Scribing
- Use this template for scribing.
- Each student may scribe 1-2 lectures depending on class size.
- Send your choice to scribe to the TA on a first-come, first-served basis.
- Submit scribe notes (pdf + source) to the TA.
- Scribe notes are due by 9pm the day after the lecture.
Useful Resources
- The Asymptotic Cheat Sheet for theoretical analyses.
- Assignments, scribe notes, and projects must be typeset in LaTeX. If you’re unfamiliar with LaTeX, refer to this introduction and this Overleaf tutorial.
Grading
- Assignments: 20%
- Final Project: 40%
- Paper Reports: 10%
- Class participation: 20%
- Final Exam: 10%
Late Submission Policy
- No late submissions are allowed. Please plan accordingly.
- For emergencies, prior permission from the instructor is required.
Collaboration and Plagiarism
- Review the Northeastern University Policy on Academic Misconduct.
- You may discuss assignment approaches with others but avoid sharing or copying solutions.
- Declare any collaboration upfront.
- Do not share your code or make it public.
- Using tools like Github Copilot, ChatGPT, or copying code from websites constitutes cheating. Cheating will result in an E in the course and referral to the University Student Behavior Committee.
Note: Any attempt to subvert the grading process is considered cheating. If in doubt, ask.
Schedule
-
This course schedule is subject to change through the semester.
-
Lecture notes/scribe notes will be uploaded here after each lecture.
-
Lectures will not be recorded.
Date |
Topic |
Reading List |
Scribe Notes |
Deadlines |
---|---|---|---|---|
Jan 06 | Course Introduction and Logistics | |||
Jan 08 | Van Emde Boas Tree | Paper 1 Paper 2 Paper 3 | Instructor Lecture notes Scribe: | |
Jan 13 | X-fast tree and Y-fast tree | Paper 1 Paper 2 | Assignment 1 release | |
Jan 15 | Succinct binary tries (Rank & Select) | Paper 1 Paper 2 | Instructor Lecture notes | Paper report 1 Due |
Jan 22 | Tries, Suffix Tree, Suffix Array | Paper 1 Paper 2 Paper 3 Paper 4 | Instructor Lecture notes | |
Jan 27 | BWT & FM-index | Paper 1 Paper 2 Paper 3 | Instructor Lecture notes | |
Jan 29 | Hashing (Balls and Bins) | Instructor Lecture notes | Assignment 1 due | |
Feb 03 | Hashing (Hash functions/Chaining/Perfect hashing) | Paper 1 Paper 2 Paper 3 Paper 4 | Instructor Lecture notes | Assignment 2 release |
Feb 05 | Hash tables (Linear prob./Cuckoo/Two-choice/Iceberg) | Paper 1 Paper 2 Paper 3 Paper 4 | Paper report 2 Due | |
Feb 10 | Filters (Bloom/Quotient/Cuckoo) | Paper 1 Paper 2 Paper 3 Paper 4 | Instructor Lecture notes | |
Feb 12 | Adaptive filters | Paper 1 Paper 2 | ||
Feb 19 | Range filters | Paper 1 Paper 2 Paper 3 | ||
Feb 24 | Misra-Gries, Count Sketch, Count-Min Sketch | Paper 1 Paper 2 Paper 3 Paper 4 | Instructor Lecture notes | |
Feb 26 | Locality Sensitive Hashing | Paper 1 Paper 2 Paper 3 | ||
Mar 10 | Nearest Neighbor Search | Paper 1 Paper 2 | Assignment 2 due | |
Mar 12 | Approximate Nearest Neighbor Search | Paper 1 Paper 2 | Paper report 3 Due | |
Mar 17 | Min Hashing & Similarity Search | Paper 1 Paper 2 Paper 3 Paper 4 | Instructor Lecture notes | Final project proposal Due |
Mar 24 | Cardinality Estimation (HyperLogLog) | Paper 1 Paper 2 Paper 3 | Instructor Lecture notes | |
Mar 26 | Graphs Representation & Computations | Paper 1 Paper 2 | Instructor Lecture notes | Paper report 4 Due |
Mar 31 | Streaming Graphs & Incremental Computations | Paper 1 Paper 2 | Instructor Lecture notes | |
Apr 02 | Comp Bio Graphs | Paper 1 Paper 2 Paper 3 Paper 4 | Instructor Lecture notes | |
Apr 07 | High-Dimensional Vector Compression | Paper 1 | Lecture notes | Final project mid-point report Due |
Apr 09 | Distributed Hash Tables | Paper 1 | Instructor Lecture notes | |
Apr 14 | Consistent Hashing Theory | Paper 1 | Instructor Lecture notes | Paper report 5 Due |
Apr 16 | ||||
Apr 23 | Final Project Presentations |