GPU Password Cracking
GPU password cracking uses the GPU’s parallelism to brute-force or dictionary-attack password hashes at high throughput. A canonical embarrassingly parallel workload: each candidate hashes independently.
Why are GPUs so good at this?
Hash functions (MD5, SHA-1, SHA-256, NTLM) are simple arithmetic that maps cleanly to GPU SIMT lanes. A consumer GPU computes tens of billions of MD5s per second, and a server GPU more. The tool of record is
hashcat, which supports dozens of hash types.
Defenses, weakest to strongest:
- Long hashes plus salt: prevents rainbow tables. Minimum bar; doesn’t slow a targeted attack
- Slow hash functions: bcrypt (adjustable
cost), scrypt, Argon2. Built to be deliberately inefficient. bcrypt with cost 12 takes ~300 ms, dropping GPU throughput from billions/sec to thousands/sec - Memory-hard functions: scrypt, Argon2id. Force large memory per hash so the GPU’s many-cores / low-mem-per-core advantage disappears
- Peppering: secret key mixed into the hash, stored separately from the DB
- Rate limiting plus 2FA on the auth endpoint
Real gotcha
If your password DB uses fast unsalted hashes (MD5, SHA-256), assume it is already cracked the moment it leaks. Use bcrypt or Argon2id.
From ECE459 L23
Each candidate password is a point in the GPU’s computation space, hashed simultaneously. A decent site locks accounts after N bad logins, but if the hash DB leaks, brute force is back on. UNIX passwords repeatedly applied the hash to stiffen the arms race. Today-uncrackable passwords are not 20-years-from-now-uncrackable, so rotate algorithms and passwords.
Pre-checks beat brute force:
- Try “password”, “system”, etc. first
- Search credential dumps: a user reusing
mycrappywebsite.comcreds on their bank needs no cracking at all - Quantum doesn’t wreck hashing the way it does asymmetric (RSA)
- xkcd 538: the $5 wrench attack
scrypt [Per09], the hash behind Dogecoin, aims to make hashing expensive in both time and space, so brute-forcing needs not just more time but more circuitry.
A memory-hard algorithm on a RAM uses space, where:
- is space used
- is operation count
- is a small constant
A sequential memory-hard function is one where the fastest sequential algorithm is memory-hard, and no parallel algorithm can asymptotically beat it. scrypt exhibits ReMix as an example, and BlockMix in a cache-aware model.
Rainbow tables [Kul09]
Rainbow tables compromise between brute force and a full hash→plaintext lookup. Build chains:
- Start at plaintext P₀, compute hash H₀
- Reduction function R maps H₀ to a new plaintext P₁ (e.g. take the first 6 digits of the hex). R is not the inverse of the hash, just a mapping back into the plaintext space
- Hash P₁ → H₁, reduce → P₂, repeat n times
- Store only (P₀, Hₙ) per chain. Chains compress millions of hashes into one row
To crack hash H’:
- Look for H’ in the final-hash column. If present, re-walk that chain from its start to locate H’ and return the preceding plaintext
- Otherwise reduce H’ to a new plaintext, hash, check again, up to n iterations
- Handle collisions and loops carefully
Both generation and lookup are GPU-friendly since reductions parallelize. A GTX295 generated ~430M links/sec for MD5 and cracked K#n&r4Z in ~1m52s [cryptohaze]. Rainbow tables for common hash+charset combos are downloadable in the 25–900 GB range.