Computer Program

Compiler

A compiler is a computer program that translates computer code written in a programming language into executable machine code files.

The 4 phases of compilation (for C++)

  1. Preprocessing: Expands the code (expansion of macros, removal of comments and conditional compilation).
  2. Compilation: Translates the code into assembly language, which an assembler can understand.
  3. Assembly: Assembler translates the code into machine code or byte code.
  4. Linking: It links various modules of the code, i.e. the function calls and finally delivers the executable.

So compilers first produce the assembly code, because it’s easier to debug huh. Like can’t you do it in a single step?

Resources

C++ Separate Compilation

I’ve always been confused by this, but I think I finally understand it.

In C++, the separation of interface (declaration) and implementation allows the compiler to work in separate translation units.

  1. During the compilation process, each .cc file is compiled independently into an object file .o (contains machine code) only needs to know function declarations
  2. During the linking phase, the linker combines all the object files and resolves the references to functions and variables linker finds the actual implementation of the functions

object file .o

Note that the object file is not directly executable. This is because it hasn’t been linked yet.

This separation allows for modular development and makes it possible to compile and link code independently, even when functions are declared in one translation unit and implemented in another.

Example seen in CS247 lecture 6

In List.h (original version), we provided a forward declaration of struct Node. Provide definition of the class in the .cc file.

This is especially useful for separate compilation.

g++ -std=c+=14 List.cc -c
  • -c compiler flag lets us produce an object file .o

Use object files to store the result of compilation, and reuse it if the .cc files haven’t changed. With many files, significant speedup while developing. We prefer to put implementations in .cc file for the following reasons:

  1. List.cc changes recompile List.cc into List.o
  2. List.h changes all .cc files that include List.h must recompile
  3. A.h changes, which includes List.h any .cc files that include A.h must change

At the end, relink all .o files together.

Forward Declaration is preferred

We prefer forward declarations of classes where possible - minimizes recompilation.

When can we simply forward declare, and when do we need to use #include?

See the example below

class A {...}; // A.h
class B: public A { // B.h
	...
}
 
class C { // C.h
	A myA;
}
 
class D { // D.h
	A* aP;
}
 
class E { // E.h
	A f(A x);
}
 
class F { // F.h
	A f(A x) {
		x.doMethod();
	...
	} 
}
  • B and C require #include in order to determine their size and compile them.
  • D all pointers are the same size, just use a forward declaration
  • E we don’t need to know the size of A, just that A exists for type-checking purposes
  • F must #include to know that doMethod exists

Danger

This is fine with smaller files, but once there are too many files, change a .h file, then many .cc files might need to be recompiled. Mental energy to figure out the dependencies and just recompile the relevant .cc files. Might be faster to just recompile everything.

Solution: Use a build automation system. Keep track of what files have changed, keep track of dependencies in compilation, just recompiles the minimal # of files to make a new executable Makefile

Random Thoughts

Tom from Enlighted: People who have worked with Compiler are usually very good at C++.

I took CS241E, and holy crap it was so hard. I don’t know why I took the enriched version, because I want to go out of my way and improve myself? Or there could have been something better I could have done with this time.