Memory Alignment
https://stackoverflow.com/questions/1063809/aligned-and-unaligned-memory-accesses
You will hear these different terminologies:
- Byte Aligned
- Word Aligned
- Memory Aligned
There is also DRAM Bursting.
- In actual 32-bit architecture, because words are aligned to multiples of four byte
- In 64-bit architecture, where there are 8 bytes per word, we use 3 bits for the byte offset, this is used to index into the right byte
Memory Alignment example
https://x.com/seatedro/status/1874668513719464056/photo/1
- This is bad for example, because you would need to read the struct 3 times, as opposed to twice if you had written it
That’s badly explained.
A Struct is just a combination of different primary data types.
“The better question would be to not tell them that anything is wrong, rather ask them what the size of this struct is in memory, whether or not it is possible to reduce its memory footprint, if possible how can you reduce the memory footprint of the struct, and the performance implications alignment has when it comes to program execution”
struct {
a: u32,
b: u64,
c: u32,
}
struct {
a: u32,
b: u32,
c: u64,
}
Related
u32 requires 4-byte alignment, so it starts at addresses divisible by 4 (e.g., 0, 4, 8, etc.). u64 requires 8-byte alignment, so it starts at addresses divisible by 8 (e.g., 0, 8, 16, etc.).
CUDA
Understanding how memory is aligned will be fundamental to getting CUDA to run so much faster.
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#device-memory-accesses
“When a warp executes an instruction that accesses global memory, it coalesces the memory accesses of the threads within the warp into one or more of these memory transactions depending on the size of the word accessed by each thread and the distribution of the memory addresses across the threads”
CUDA Memory Alignment
Also see CUDA Memory.