CS451: Data-Intensive Distributed Computing
https://student.cs.uwaterloo.ca/~cs451/syllabus.html
For Steven: You’re probably looking for Latency numbers every programmer should know.
Really great teacher. Has gone downhill as the term progressed. Became boring. Though slides are pretty good.
Concepts
Chapter 2 & 3: MapReduce & Spark
Chapter 4: Dealing with Text
Chapter 5: Graphs
Data Mining Part 1: Machine Learning
- Classification
- Supervised Learning
- Logistic Regression
- Gradient Descent
- Stochastic Gradient Descent
- Ensemble Modelling
- K-Means Clustering
Production ML pipeline: (1) offline training and evaluation (holdout, cross-validate, etc.), (2) A/B test vs. other methods.
Data Mining Part 2: Similarity Search
Chapter 7: Relational Data
Chapter 9: Mutable State
Chapter 10: Streaming / Probabilistic Structures