Cloud Computing

Renting compute, storage, and networking on-demand from a provider instead of owning hardware. Billed per instance per unit time.

Why?

Buying and maintaining racks used to be the only option if you wanted a dedicated server. Clouds let you scale from one instance to thousands in minutes and stop paying the moment you shut them off.

From ECE459 L30

Evolution:

  • Dedicated hardware: physical machine in a rack, or worse, shared hosting
  • Virtualization: pay for part of a machine (slicehost), root access inside a VM; Spectre / Meltdown let you peek outside your slice, though
  • Cloud: add more machines on-demand; persistent storage sits separately in the cloud

You pay per instance. Sizes vary on cores, local storage, memory. Some come with GPUs (uneconomic for A3 in this course; ecetesla is used instead).

Launching:

  • Start from a VM image (AMI-like)
  • Launch via CLI or web UI; you get an IP and full root
  • Amazon ships public images for many Linuxes, Windows Server, OpenSolaris
  • Bake in Hadoop, OpenMPI, or whatever you need

Terminating: shut it down, stop paying. All data on the instance is gone.

Continuous deployment: bring up new instances, drain and kill old, no downtime gap. Incompatible schema changes still bite the old nodes during the overlap.

Persistent storage shapes:

  • Block storage (EBS): mount a remote disk
  • Database (SimpleDB, RDS): managed DB server
  • Object store (S3): files on the web

Clusters vs laptops

Source: [McS15], “Scalability! But at what COST?“.

Big-data systems introduce substantial overhead; absolute performance matters, not just scalability. GraphX (OSDI 2014, 128 cores) vs a competent single-threaded laptop on graphs with billions of edges:

  • PageRank on twitter_rv: big-data 249 to 857s; laptop 300s
  • Label propagation (connectivity): big-data 251 to 1784s; laptop 153s

Algorithmic wins on the laptop: Hilbert-curve layout (locality) for PageRank, union-find for connectivity. Net: 2x PageRank speedup, 10x label-propagation speedup.

Authors’ takeaways:

“If you are going to use a big data system for yourself, see if it is faster than your laptop.” “If you are going to build a big data system for others, see that it is faster than my laptop.”

Oscar Wilde: “The bureaucracy is expanding to meet the needs of the expanding bureaucracy.”