Stack and Stack Frame (a.k.a Frame)

I had a lot of trouble understanding this, and how it was related to Chunk. This is really fundamental knowledge.

A stack is just a special portion of memory that we use to manage function calls and local data in programs.

  • It helps keep track of which functions you are in (by storing the return address), as well as local variables that might be used (but most compilers can just use the registers as storage), so that when we resume from a function, the local variables get restored into the registerse
  • The compiler does all this work so you don’t need to worry about it

Really fundamental

If you try and look inside Godbolt, when you look at the assembly code for a function call, the first thing is called always pushing context to the stack pointer. Because after you exit the function, you want to make sure to restore those values.

Maybe Ben Eater helps: What is a stack and how does it work?

The stack grows downwards.

  • When you add things to your stack pointer, you actually decrement it
  • When you remove things from your stack pointer, you actually increment it’s address
  • You did this in SE350 lab

NOT all local variables actually stored in the stack

Local variables are actually stored in the stack under specific circumstances, mostly when the compiler can’t keep them in registers or when certain features of the code require stack storage.

  • So the main thing that gets pushed is the return address that needs to be updated for the program counter.
  • Arguments / variables can be stored in registers directly, but need to be careful
  • Also see Caller-Save

Higher Memory Addresses
+---------------------+  <- 0xFFFFFFFF (High Address)
|      Stack         |  (Grows downward ⬇️)
|    Function Calls  |
|    Local Vars      |
+---------------------+
|      Heap          |  (Grows upward ⬆️)
|    Dynamic Memory  |
+---------------------+
|    Data Segment    |  (Global & Static Variables)
+---------------------+
|    Code Segment    |  (Program Instructions)
+---------------------+  <- 0x00000000 (Low Address)
Lower Memory Addresses

Confusing

Some drawings represent the top as higher memory address, and bottom as lower memory addresses. The picture below has top as lower memory addresses. Diagrams above use top as higher memory address.

From the Procedure note:

When you push to the stack

You are pushing the current context, not the context of the things that you are going into.

Ex: You are insidefoo(), and you make a function call to bar()

  • The caller (foo()) does NOT allocate local variable space for the next function—it only pushes the return address and arguments to its own foo() stack frame
  • The callee (new function) is responsible for allocating its own local variables on the stack.
  • Stack frames are dynamically created and removed as functions are called and returned.

Consider this example

#include <stdio.h>
 
void bar(int a) {
    printf("Inside bar(), received: %d\n", a);
}
 
void foo() {
    int x = 10;
    bar(x);  // `foo()` calls `bar()` and passes `x`
    printf("Back in foo()\n");
}
 
int main() {
    foo();  // `main()` calls `foo()`
    printf("Back in main()\n");
    return 0;
}

This is how the stack looks like when you are inside bar():

FRAME of main()
+-----------------------+  <-- Higher memory
|   Return Address      |  <-- back to system address
+-----------------------+
|  Previous RBP (system)|  <-- stored via `push rbp`
+-----------------------+
|   Local Variables     | 
+-----------------------+  <-- RBP of main()
 
FRAME of foo()
+-----------------------+  <-- RBP of main()
|   Return Address      |  <-- back to main(), stored during `call`
+-----------------------+
| Previous RBP (main()) |  <-- stored via `push rbp`
+-----------------------+
|   Local Variable x=10 | 
+-----------------------+  <-- RBP of foo()
 
FRAME of bar()
+-----------------------+  <-- RBP of foo()
|   Return Address      |  <-- back to foo(), stored during `call`
+-----------------------+
|  Previous RBP (foo()) |  <-- stored via `push rbp`
+-----------------------+
|   Argument a=10       |  <-- Argument passed from foo() to bar()
+-----------------------+
|   Local Variables     |
+-----------------------+  <-- RBP of bar()
|   More stack space    |  <-- Space used during execution inside bar()
+-----------------------+  <-- ESP (or RSP for 64-bit)
 

Why do you push arguments to the stack?

Caller and Callee Need to Communicate

  • The calling function (foo()) has a value (x) that must be passed to bar().
  • Since functions have separate stack frames, bar() cannot directly access foo()’s local variables.
  • By pushing the argument onto the stack, bar() can retrieve it from its expected memory location.

Another example

Here’s a program to illustrate this (also see Callee-Save):

int hello(int x) {
    int y = x + 2;
    return y + 4;
}
 
int main() {
    hello(3);
}
hello(int):
        push    rbp ; Save address of previous stack frame
        mov     rbp, rsp ; Address of current stack frame
        mov     DWORD PTR [rbp-20], edi ; store argument into stack
        mov     eax, DWORD PTR [rbp-20]
        add     eax, 2
        mov     DWORD PTR [rbp-4], eax
        mov     eax, DWORD PTR [rbp-4]
        add     eax, 4
        pop     rbp
        ret
main:
        push    rbp
        mov     rbp, rsp
        mov     edi, 3
        call    hello(int)
        mov     eax, 0
        pop     rbp
        ret

how is rsp set? who sets that?

When call hello, the CPU automatically adjusts rsp to reflect changes.

What is the point of rbp, if rsp is set correctly anyways??

What you don’t see… PC return address is also pushed onto the stack.

Notice that registers are used

In the x86-64 System V calling convention (used by most Unix-like systems on x86-64 architecture), the first few function arguments are passed using registers rather than being pushed onto the stack.

Let’s look at an example where there are lots of arguments passed.

int hello(int a1, int a2, int a3, int a4, int a5, int a6, int a7, int a8, int a9, int a10) {
    int y = a1 + a2 + a3 + a4 + a5 + a6 + a7 + a8 + a9 + a10 + 2;
    return y + 4;
}
 
int main() {
    int z = hello(1,2,3,4,5,6,7,8,9,10);
    return z + 10;
}
hello(int, int, int, int, int, int, int, int, int, int):
        push    rbp
        mov     rbp, rsp
        mov     DWORD PTR [rbp-20], edi
        mov     DWORD PTR [rbp-24], esi
        mov     DWORD PTR [rbp-28], edx
        mov     DWORD PTR [rbp-32], ecx
        mov     DWORD PTR [rbp-36], r8d
        mov     DWORD PTR [rbp-40], r9d
        mov     edx, DWORD PTR [rbp-20]
        mov     eax, DWORD PTR [rbp-24]
        add     edx, eax
        mov     eax, DWORD PTR [rbp-28]
        add     edx, eax
        mov     eax, DWORD PTR [rbp-32]
        add     edx, eax
        mov     eax, DWORD PTR [rbp-36]
        add     edx, eax
        mov     eax, DWORD PTR [rbp-40]
        add     edx, eax
        mov     eax, DWORD PTR [rbp+16]
        add     edx, eax
        mov     eax, DWORD PTR [rbp+24]
        add     edx, eax
        mov     eax, DWORD PTR [rbp+32]
        add     edx, eax
        mov     eax, DWORD PTR [rbp+40]
        add     eax, edx
        add     eax, 2
        mov     DWORD PTR [rbp-4], eax
        mov     eax, DWORD PTR [rbp-4]
        add     eax, 4
        pop     rbp
        ret
main:
        push    rbp
        mov     rbp, rsp
        sub     rsp, 16
        push    10
        push    9
        push    8
        push    7
        mov     r9d, 6
        mov     r8d, 5
        mov     ecx, 4
        mov     edx, 3
        mov     esi, 2
        mov     edi, 1
        call    hello(int, int, int, int, int, int, int, int, int, int)
        add     rsp, 32
        mov     DWORD PTR [rbp-4], eax
        mov     eax, DWORD PTR [rbp-4]
        add     eax, 10
        leave
        ret

Stack layout of hello():

+-----------------------------+  <-- Higher memory
|   a10 = 10                  |  <-- [rbp + 40]
+-----------------------------+
|   a9 = 9                    |  <-- [rbp + 32]
+-----------------------------+
|   a8 = 8                    |  <-- [rbp + 24]
+-----------------------------+
|   a7 = 7                    |  <-- [rbp + 16]
+-----------------------------+
|   Return Address            |  <-- [rbp + 8] (back to main)
+-----------------------------+
|   Previous RBP (main)       |  <-- [rbp] (Saved base pointer)
+-----------------------------+  <-- RBP of hello()
|   Local Variable y          |  <-- [rbp - 4]
|   ....                      |
|   a1 = 1                    |  <-- [rbp - 20] (Stored from EDI)
|   a2 = 2                    |  <-- [rbp - 24] (Stored from ESI)
|   a3 = 3                    |  <-- [rbp - 28] (Stored from EDX)
|   a4 = 4                    |  <-- [rbp - 32] (Stored from ECX)
|   a5 = 5                    |  <-- [rbp - 36] (Stored from R8D)
|   a6 = 6                    |  <-- [rbp - 40] (Stored from R9D)
+-----------------------------+  <-- Lower memory (stack grows downward)

So in x86

The first six integer or pointer arguments are passed through registers: rdi, rsi, rdx, rcx, r8, r9. The remaining are pushed onto the stack by the caller, in reverse order.

Push order:

  • rdi → a1
  • rsi → a2
  • rdx → a3
  • rcx → a4
  • r8d → a5
  • r9d → a6
  • Stack → a7 ([rbp+16])
  • Stack → a8 ([rbp+24])
  • Stack → a9 ([rbp+32])
  • Stack → a10 ([rbp+40])

Note

rbp is base pointer and rsp is stack pointer.

What happens during push rbp:

  1. Decrement rsp
  2. Store value at new rsp: The value in rbp is then copied to the memory address now pointed to by the updated rsp.

You see that we also do push 10, so it’s literally just decrementing the stack pointer by 8 and then storing data there.

Does that mean it only stores 8 bytes of data?

Yes, push operations always store 8 bytes (64 bits) at a time because the general-purpose registers (rbp, rsp, rax, etc.) are all 64 bits wide.

đź“Ť rbp (Base Pointer / Frame Pointer):

  • It serves as a fixed reference point for the current stack frame in a function.
  • Typically, it’s used to access function parameters and local variables via fixed offsets (like [rbp - 4] for a local variable).
  • Its value usually remains constant during the execution of a function, unless explicitly modified.
  • Helps debuggers and stack tracers understand the call stack more easily.

đź“Ť rsp (Stack Pointer):

  • Always points to the top of the stack (i.e., the most recent value pushed onto the stack).
  • It changes dynamically during function calls, pushes, pops, and any other stack-related operations.
  • Stack operations (like push or pop) directly affect rsp.

Implement a Stack in Machine Language (CS241)

We start to store the stack towards the end of the memory address.

Some Definitions

  • The stack pointer is a register that stores the address of the word at the top of the stack (by convention stored at Register 30) (known at runtime)
  • The Frame Pointer is a register that stores the memory location of the frame for the currently executing procedure (known at runtime)
    • IMPORTANT: Variables are accessed at constant offsets from the address in the frame pointer.
  • Frame / Stack frame / activation record = The area of the stack reserved for the variable instances of an invocation of a procedure. the area of memory for values of locals of a procedure invocation
  • The Symbol Table maps each local variable in the procedure to a constant offset from the frame pointer (fixed at compile time)

For a stack, variables are accessed at constant offsets from the address in the frame pointer.

title: Stack Pointer vs. Frame Pointer
Use the Frame Pointer to access your variables, NOT the Stack Pointer.
 
Why? To free up the stack pointer so that the stack can be used for other purposes within the procedure, such as storing temporary values, that are created within the procedure
- Think about it if you were to only use a Stack Pointer. Then, you push additional variables. Now you need to compensate for the extra offset. I guess that would still work. But everything is now shifted. Imagine that inside this procedure, another procedure is called. The problem of the frame pointer being overwritten is inside [[Procedure]]
 

Example of how this works in practice with stack frames:

  • The program starts execution in main().
    • The control stack gets a stack frame for main().
  • main() calls funcA().
    • A new stack frame for funcA() is pushed onto the control stack.
  • funcA() calls funcB().
    • A new stack frame for funcB() is added to the control stack.

At this point, the control stack has three stack frames: one for main(), one for funcA(), and one for funcB(). Each function’s context is stored in its respective stack frame.

At Higher level

We have these functions

/** Generates a `VarAccess` that reads `variable` into `register`. */
def read(register: Reg, variable: Variable) = VarAccess(register, variable, read = true)  
  
/** Generates a `VarAccess` that writes `register` into `variable`. */
def write(variable: Variable, register: Reg) = VarAccess(register, variable, read = false)  
  
/** Generates an assignment statement that first evaluates `expr` (assuming its result is stored into `Reg.result`),  
  * then writes the result into `variable`.  
  */def assign(variable: Variable, expr: Code) = block(expr, write(variable, Reg.result))

VarExcess is an abstraction. When it comes to compile, we call eliminateVarAccess, which does the following:

def compilerA3(code: Code, variables: Seq[Variable]): MachineCode = {  
  val frame = Chunk(variables)  
  val code1 = eliminateVarAccessesA3(code, frame)  
  val code2 = allocateFrameOnStack(code1, frame)  
  toMachineCode(block(code2, JR(Reg.link)))  
}
.
.
.
 
def eliminateVarAccessesA3(code: Code, frame: Chunk): Code = {  
  def fun: PartialFunction[Code, Code] = {  
    case va: VarAccess =>  
      {  
        if (va.read) {  
          frame.load(Reg.framePointer, va.register, va.variable)  
        } else { // Write  
          frame.store(Reg.framePointer, va.variable, va.register)  
        }  
      }  
  }  
  transformCode(code, fun)  
}
  
/** Given a `body` of `Code` and a `frame` of variables that it uses, generates  
  * code that allocates space for the `frame` on the stack and sets `Reg.framePointer` to  
  * point to it, followed by the `body`, followed by code to free the space for the `frame` from the stack.  
  */def allocateFrameOnStack(body: Code, frame: Chunk): Code = {  
  block(Stack.allocate(frame), ADD(Reg.framePointer, Reg(0), Reg.stackPointer), body, Stack.pop)  
}

Loading and Storing Variables (with Stack Frames and Chunk)

As iterated above, we load and store variables using the frame pointer, not the Stack Pointer.

In CS241E , we never push or pop an individual word on the stack at a time. Instead, we organize all memory in Chunks, collections of some known number of words stored at consecutive memory locations. Each Chunk represents a Stack Frame that contains the variables.

Is this how this is done in the real world?

I think partially yes, not super sure though, someone tell me..

We push enough space onto the stack to store a Chunk (which is a stack frame.). Then, to access the variables, we use the symbol table, and load the value with an offset from the base register (Reg.framePointer) and store that to a target register.

def load(baseRegister: Reg, register: Reg, variable: Variable): Code = {  
  LW(register, variableToOffset(variable), baseRegister)
}  
  
def store(baseRegister: Reg, variable: Variable, register: Reg): Code = {  
  SW(register, variableToOffset(variable), baseRegister)
}

In general, we have something like this:

LW(Reg(1), 8, Reg.framePointer) 
SW(Reg(1), 8, Reg.framePointer)

Push and Pop Stack Operations

The code below is not used in practice, just as a reference to give you a basic idea.

Pushing to the Stack

  1. Subtract 4 from the stack pointer
  2. store at the memory address in the new stack pointer
LIS(Reg.result) 
Word(encodeSigned(4)) 
SUB(Reg.stackPointer, Reg.stackPointer, Reg.result) 
SW(Reg (1) , 0, Reg.stackPointer)

Popping from the Stack

  1. Load the word from the memory address given by the stack pointer
  2. Add 4 to the stack pointer (this is the address of the next word on the stack)
LW(Reg(1), 0, Reg.stackPointer) 
LIS(Reg.result) 
Word(encodeSigned(4)) 
ADD(Reg.stackPointer, Reg.stackPointer, Reg.result)