GPU Password Cracking

GPU password cracking uses the GPU’s parallelism to brute-force or dictionary-attack password hashes at high throughput. A canonical embarrassingly parallel workload: each candidate hashes independently.

Why are GPUs so good at this?

Hash functions (MD5, SHA-1, SHA-256, NTLM) are simple arithmetic that maps cleanly to GPU SIMT lanes. A consumer GPU computes tens of billions of MD5s per second, and a server GPU more. The tool of record is hashcat, which supports dozens of hash types.

Defenses, weakest to strongest:

  • Long hashes plus salt: prevents rainbow tables. Minimum bar; doesn’t slow a targeted attack
  • Slow hash functions: bcrypt (adjustable cost), scrypt, Argon2. Built to be deliberately inefficient. bcrypt with cost 12 takes ~300 ms, dropping GPU throughput from billions/sec to thousands/sec
  • Memory-hard functions: scrypt, Argon2id. Force large memory per hash so the GPU’s many-cores / low-mem-per-core advantage disappears
  • Peppering: secret key mixed into the hash, stored separately from the DB
  • Rate limiting plus 2FA on the auth endpoint

Real gotcha

If your password DB uses fast unsalted hashes (MD5, SHA-256), assume it is already cracked the moment it leaks. Use bcrypt or Argon2id.

From ECE459 L23

Each candidate password is a point in the GPU’s computation space, hashed simultaneously. A decent site locks accounts after N bad logins, but if the hash DB leaks, brute force is back on. UNIX passwords repeatedly applied the hash to stiffen the arms race. Today-uncrackable passwords are not 20-years-from-now-uncrackable, so rotate algorithms and passwords.

Pre-checks beat brute force:

  • Try “password”, “system”, etc. first
  • Search credential dumps: a user reusing mycrappywebsite.com creds on their bank needs no cracking at all
  • Quantum doesn’t wreck hashing the way it does asymmetric (RSA)
  • xkcd 538: the $5 wrench attack

scrypt [Per09], the hash behind Dogecoin, aims to make hashing expensive in both time and space, so brute-forcing needs not just more time but more circuitry.

A memory-hard algorithm on a RAM uses space, where:

  • is space used
  • is operation count
  • is a small constant

A sequential memory-hard function is one where the fastest sequential algorithm is memory-hard, and no parallel algorithm can asymptotically beat it. scrypt exhibits ReMix as an example, and BlockMix in a cache-aware model.

Rainbow tables [Kul09]

Rainbow tables compromise between brute force and a full hash→plaintext lookup. Build chains:

  1. Start at plaintext P₀, compute hash H₀
  2. Reduction function R maps H₀ to a new plaintext P₁ (e.g. take the first 6 digits of the hex). R is not the inverse of the hash, just a mapping back into the plaintext space
  3. Hash P₁ → H₁, reduce → P₂, repeat n times
  4. Store only (P₀, Hₙ) per chain. Chains compress millions of hashes into one row

To crack hash H’:

  1. Look for H’ in the final-hash column. If present, re-walk that chain from its start to locate H’ and return the preceding plaintext
  2. Otherwise reduce H’ to a new plaintext, hash, check again, up to n iterations
  3. Handle collisions and loops carefully

Both generation and lookup are GPU-friendly since reductions parallelize. A GTX295 generated ~430M links/sec for MD5 and cracked K#n&r4Z in ~1m52s [cryptohaze]. Rainbow tables for common hash+charset combos are downloadable in the 25–900 GB range.