ECE459
Programming for Performance. Course repo.
Summary / cheat sheet: Performance Playbook
A1:
Study index (L01–L35)
L01: Programming for Performance
- Bandwidth vs Latency
- Embarrassingly Parallel
- Data Race · Deadlock
- Amdahl’s Law · Scalability
- Crista’s Five Laws of Performant Software
L02: Rust Basics
L03: Rust: Borrowing, Slices, Threads, Traits
L04: Rust: Breaking the Rules
L05: Asynchronous I/O
L06: Modern Processors
L07: CPU Hardware, Branch Prediction
L08: Cache Coherency
- Cache Coherency
- MESI Protocol
- Snoopy Cache
- Write-Back Cache vs Write-Through Cache
- False Sharing · Cache Line
L09: Algorithms, Concurrency, and Parallelism
- Amdahl’s Law (revisited)
- Gustafson’s Law
- Accidentally Quadratic
- Thread Pool
L10: Software Architecture
L11: Use of Locks, Reentrancy
- Locking Granularity
- Four Conditions for Deadlock
- Global Interpreter Lock
- Reentrancy
- Functional Programming and Parallelization
L12: Lock Convoys, Atomics, Lock-Freedom
- Lock Convoy
- Atomic Operation
- Lock-Free Programming
- Wait-Free Programming
- Compare-and-Swap (CAS)
- ABA Problem
L13: Dependencies and Speculation
L14: Early Termination, Reduced-Resource Computation
L15: Memory Consistency
L16: Rate Limits
L17: Data Parallelism
L18: Compiler Optimizations
L19: Query Optimization
L20: Self-Optimizing Software
L21/L22: GPU Programming (CUDA)
L23: Password Cracking, Bitcoin Mining, LLMs
L24: Profiling: Observing Operations
L25: Load Testing
L26: Finding Bottleneck Devices
L27: Program Profiling and POGO
L28: Causal and Simulation Profiling
L29: Liar, Liar (Benchmarking Pitfalls)
L30: Clusters & Cloud Computing
L31: Introduction to Queueing Theory
L32: Convergence, Ergodicity, Applications
L33: More Advanced Queueing Theory
- Queueing Complications (balking, reneging, loss, priority, FastPass sim)
L34: DevOps: Configuration
L35: DevOps: Operations