Pinned Memory

Pinned (page-locked) memory is host RAM the OS promises not to swap, keeping those pages in physical memory so the GPU’s DMA engine can read directly from them.

Why does regular memory need this?

Normal host memory is pageable, the OS can swap it to disk at any time. A cudaMemcpy from pageable RAM has to first copy into a pinned staging buffer, then DMA to GPU. That first copy is pure overhead. Pinned memory skips it.

Resource:

https://docs.pytorch.org/tutorials/intermediate/pinmem_nonblock.html

Payoff

Higher effective bandwidth: saturates PCIe instead of being bottlenecked by the staging copy
Enables async H2D: cudaMemcpyAsync requires pinned source to actually be asynchronous. Without it, the call silently blocks
Unlocks compute on a CUDA stream, which is how data-loading latency gets hidden behind the training step

PyTorch usage

loader = DataLoader(dataset, pin_memory=True, num_workers=4)
for batch in loader:
    batch = batch.to('cuda', non_blocking=True)  # async only works with pinned source
    ...

Writing it manually:

x = torch.randn(1000, 1000).pin_memory()
y = x.to('cuda', non_blocking=True)

See PyTorch Performance Tuning for the full recipe.

Trade-off

Pinned pages can’t be swapped, so over-allocating pressures OS memory and can destabilize the system. Pin data-loader buffers, not everything.

🛠️ Steven Gong

Table of Contents

Pinned Memory

Payoff

PyTorch usage

Trade-off

Graph View

Backlinks

🛠️ Steven Gong

Table of Contents

Pinned Memory

Payoff

PyTorch usage

Trade-off

Related

Graph View

Backlinks