CS451: Data-Intensive Distributed Computing

https://student.cs.uwaterloo.ca/~cs451/syllabus.html

For Steven: You’re probably looking for Latency numbers every programmer should know.

Really great teacher. Has gone downhill as the term progressed. Became boring. Though slides are pretty good.

Concepts

Chapter 2 & 3: MapReduce & Spark

Chapter 4: Dealing with Text

Chapter 5: Graphs

Data Mining Part 1: Machine Learning

Production ML pipeline: (1) offline training and evaluation (holdout, cross-validate, etc.), (2) A/B test vs. other methods.

Data Mining Part 2: Similarity Search

Chapter 7: Relational Data

Chapter 9: Mutable State

Chapter 10: Streaming / Probabilistic Structures