Locking Granularity
Locking granularity is how much data a single lock protects. It’s a trade-off between parallelism, overhead, and bug-proneness, covered in ECE459 L11.
Why does it matter?
Critical sections should be “as large as they need to be but no larger.” Too coarse kills parallelism. Too fine means deadlocks and lock-management overhead.
Coarse-grained uses few locks (maybe one):
- easier to implement
- one lock means no deadlock
- lowest memory and setup cost
- parallel program collapses to sequential
Python’s GIL locks the whole interpreter. Only I/O-bound threads see a benefit, and non-I/O threaded Python is slower than sequential. OS kernels have had similar big kernel locks. Linux had one from SMP support until around 2011.
Fine-grained uses many small locks:
- maximizes parallelization
- wasted memory and setup time if the program isn’t very parallel
- exposes you to deadlocks and “did I grab the right lock?” bugs
Databases lock fields, records, or tables depending on scope. Object-level locking works, but watch out for transactional needs.
Sizing the critical section in Rust
In Rust the critical section ends when the MutexGuard drops. Shrink it with manual scoping { ... } or an explicit drop(guard).
L11’s producer-consumer example: by default the guard lives until the end of the loop body, so every unrelated call runs locked. Wrap just the buf-touching statements in an inner block that returns to_consume:
// Before — guard held for the whole loop body
let mut buf = buffer.lock().unwrap();
let current_consume_space = buf.consumer_count;
let next_consume_space = (current_consume_space + 1) % buf.buffer.len();
let to_consume = *buf.buffer.get(current_consume_space).unwrap();
buf.consumer_count = next_consume_space;
spaces.add_permits(1); // unrelated to buf
permit.forget(); // unrelated to buf
consume_item(to_consume); // the actual work// After — guard dropped after the inner block
let to_consume = {
let mut buf = buffer.lock().unwrap();
let current_consume_space = buf.consumer_count;
let next_consume_space = (current_consume_space + 1) % buf.buffer.len();
let to_consume = *buf.buffer.get(current_consume_space).unwrap();
buf.consumer_count = next_consume_space;
to_consume
};
spaces.add_permits(1);
permit.forget();
consume_item(to_consume);With thread::sleep added to consume_item to simulate real work, hyperfine reports ~2.8 s before, ~1.1 s after. The same shrink applies symmetrically on the producer side.
Three concerns when using locks
- Overhead: memory, init/destroy, acquire/release time. Scales with lock count
- Contention: most locking time is spent waiting. Shrink the region or split the lock
- Deadlocks: more locks, more chances to cycle