Computer Program

This is what my Software Engineering program is about, understanding all levels of how a computer program works, from software program down to the machine code. Super interesting!

Layers of the Program

  • Application software
    • Written in high-level language
  • System software
    • Compiler: translates HLL code to machine code
    • Operating System: service code
      • Handling input/output
      • Managing memory and storage
      • Scheduling tasks & sharing resources
  • Hardware
    • Processor, memory, I/O controllers

Levels of Program Code

High-level language

  • Level of abstraction closer to problem domain
  • Provides for productivity and portability

A Compiler converts this high-level language into Assembly language.

Assembly language

  • Textual representation of instructions

An Assembler converts Assembly into Machine Language.

Hardware representation (Machine Language)

  • Binary digits (bits)
  • Encoded instructions and data

For C++

Sort of 4 high level steps (1 extra step for linking at the end)

  1. High-Level Language (C++)
#include "foo.h"
int main() {
    foo();
    return 0;
}
  1. Compilation (High-Level Language Assembly): The compiler goes through several steps internally:
  • Lexical analysis, syntax analysis, semantic analysis, optimization, and code generation.

Assembly code is generated as an intermediate step after compilation.

  • Modern compilers may skip producing a visible assembly file, going straight to object code
  • If you want to see it, use g++ -S
mov eax, 0        ; Set return value to 0
call foo          ; Call the 'foo' function
ret               ; Return from main
  1. Assembler (Intermediate Representation Machine Code):
  • Converts it into machine code (i.e., object files, .o or .obj files).
  • Specific to the architecture (e.g., x86, ARM), and each instruction corresponds directly to CPU instructions
  • Object files contain machine code, but they are incomplete programs, as they may contain unresolved references to symbols from other object files.

At this level, object files may also contain:

How does this object file know about symbol table?

So this symbol table only defines the things known to it. Some variables that are used but defined will be marked as “undefined”. Later on, the linker will combine the symbol tables from all the files to get a correct unified symbol table.

Machine code example (for an x86 architecture):

r
b8 00 00 00 00     ; mov eax, 0
e8 00 00 00 00     ; call foo
c3                 ; ret
  1. Linking (Combining Object Files into a Binary) Linker takes multiple object files and resolves references between them.

For example, it resolves the foo() function call in main.o by finding foo() in foo.o. The linker also:

  • Fixes addresses of functions and variables (relocation).
  • Combines multiple object files into a single binary executable or library (e.g., .exe, .out, .dll, .so).
  • Optionally links against libraries (either static or dynamic) to include external code (e.g., the C++ Standard Library).

After linking, the output is the final machine code that the operating system can run directly.