Compiler
A compiler is a computer program that translates computer code written in a programming language into executable machine code files.
The 4 phases of compilation (for C++)
- Preprocessing: Expands the code (expansion of macros, removal of comments and conditional compilation).
- Compilation:Â Translates the code into assembly language, which an assembler can understand.
- Assembly:Â Assembler translates the code into machine code or byte code.
- Linking:Â It links various modules of the code, i.e. the function calls and finally delivers the executable.
So compilers first produce the assembly code, because it’s easier to debug huh. Like can’t you do it in a single step?
- https://www.quora.com/Why-do-you-need-both-a-compiler-and-ans-assembler-to-turn-your-source-code-into-binary-language-Why-cant-you-just-do-it-in-a-single-step
- https://stackoverflow.com/questions/845355/do-programming-language-compilers-first-translate-to-assembly-or-directly-to-mac
Resources
- https://godbolt.org/ (Compiler Explorer, SUPER COOL!)
- CppCon 2017: Matt Godbolt “What Has My Compiler Done for Me Lately? Unbolting the Compiler’s Lid”
- Compiler Architecture summary notes
- CS241E (though I had a lot of trouble understanding this course)
C++ Separate Compilation
I’ve always been confused by this, but I think I finally understand it.
In C++, the separation of interface (declaration) and implementation allows the compiler to work in separate translation units.
- During the compilation process, each .
cc
file is compiled independently into an object file.o
(contains machine code) only needs to know function declarations - During the linking phase, the linker combines all the object files and resolves the references to functions and variables linker finds the actual implementation of the functions
object file
.o
Note that the object file is not directly executable. This is because it hasn’t been linked yet.
This separation allows for modular development and makes it possible to compile and link code independently, even when functions are declared in one translation unit and implemented in another.
Example seen in CS247 lecture 6
In List.h
(original version), we provided a forward declaration of struct Node
. Provide definition of the class in the .cc
file.
This is especially useful for separate compilation.
-c
compiler flag lets us produce an object file.o
Use object files to store the result of compilation, and reuse it if the .cc
files haven’t changed. With many files, significant speedup while developing. We prefer to put implementations in .cc
file for the following reasons:
List.cc
changes recompileList.cc
intoList.o
List.h
changes all.cc
files that includeList.h
must recompileA.h
changes, which includesList.h
any.cc
files that includeA.h
must change
At the end, relink all .o
files together.
Forward Declaration is preferred
We prefer forward declarations of classes where possible - minimizes recompilation.
When can we simply forward declare, and when do we need to use #include
?
See the example below
B
andC
require#include
in order to determine their size and compile them.D
all pointers are the same size, just use a forward declarationE
we don’t need to know the size ofA
, just thatA
exists for type-checking purposesF
must#include
to know thatdoMethod
exists
Danger
This is fine with smaller files, but once there are too many files, change a
.h
file, then many.cc
files might need to be recompiled. Mental energy to figure out the dependencies and just recompile the relevant.cc
files. Might be faster to just recompile everything.
Solution: Use a build automation system. Keep track of what files have changed, keep track of dependencies in compilation, just recompiles the minimal # of files to make a new executable Makefile
Random Thoughts
Tom from Enlighted: People who have worked with Compiler are usually very good at C++.
I took CS241E, and holy crap it was so hard. I don’t know why I took the enriched version, because I want to go out of my way and improve myself? Or there could have been something better I could have done with this time.