Cloud Computing
Renting compute, storage, and networking on-demand from a provider instead of owning hardware. Billed per instance per unit time.
Why?
Buying and maintaining racks used to be the only option if you wanted a dedicated server. Clouds let you scale from one instance to thousands in minutes and stop paying the moment you shut them off.
From ECE459 L30
Evolution:
- Dedicated hardware: physical machine in a rack, or worse, shared hosting
- Virtualization: pay for part of a machine (slicehost), root access inside a VM; Spectre / Meltdown let you peek outside your slice, though
- Cloud: add more machines on-demand; persistent storage sits separately in the cloud
You pay per instance. Sizes vary on cores, local storage, memory. Some come with GPUs (uneconomic for A3 in this course; ecetesla is used instead).
Launching:
- Start from a VM image (AMI-like)
- Launch via CLI or web UI; you get an IP and full root
- Amazon ships public images for many Linuxes, Windows Server, OpenSolaris
- Bake in Hadoop, OpenMPI, or whatever you need
Terminating: shut it down, stop paying. All data on the instance is gone.
Continuous deployment: bring up new instances, drain and kill old, no downtime gap. Incompatible schema changes still bite the old nodes during the overlap.
Persistent storage shapes:
- Block storage (EBS): mount a remote disk
- Database (SimpleDB, RDS): managed DB server
- Object store (S3): files on the web
Clusters vs laptops
Source: [McS15], “Scalability! But at what COST?“.
Big-data systems introduce substantial overhead; absolute performance matters, not just scalability. GraphX (OSDI 2014, 128 cores) vs a competent single-threaded laptop on graphs with billions of edges:
- PageRank on twitter_rv: big-data 249 to 857s; laptop 300s
- Label propagation (connectivity): big-data 251 to 1784s; laptop 153s
Algorithmic wins on the laptop: Hilbert-curve layout (locality) for PageRank, union-find for connectivity. Net: 2x PageRank speedup, 10x label-propagation speedup.
Authors’ takeaways:
“If you are going to use a big data system for yourself, see if it is faster than your laptop.” “If you are going to build a big data system for others, see that it is faster than my laptop.”
Oscar Wilde: “The bureaucracy is expanding to meet the needs of the expanding bureaucracy.”