CS247: Software Engineering Principles
https://student.cs.uwaterloo.ca/~cs247/
This is one of the most fundamental course that taught me how to write C++, actually, unlike CS138.
Link to course notes here (typed out notes below).
Resources
- CS246E LaTeX notes by Sibelius
Update after the final
There were actually some questions that caught me off-guard, that I didn’t think about:
- Can you have a vector of references? Apparently not, why? See this thread
- Can you
dynamic_cast
aconst
?- Does
reinterpret_cast
ever fail between pointers?
Concepts
Lecture Notes
Lecture 1
We want to prevent Tampering and Forgery of our ADTs. Forgery: Creating invalid instances of an ADT Tampering: Modifying a valid instance of an ADT to became invalid in an unexpected way.
Example of Rational ADT
- The
explicit
keyword is used for single parameter constructors and prevents implicit conversions, see explicit (C++)
Suppose we have the function
If you don’t have the explicit
keyword, you could do f(247)
. However, if you add the explicit keyword, this won’t work. You need to write
Another example:
Another example:
- This works because
std::string
has a single param constructor which takes in achar*
and is NON-EXPLICIT.
Does the Rational ADT need to be a class? Not necessarily, ADTs are abstract, they are not tied to a particular implementation.
Could use a struct.
Classes are nicer:
- construction / destruction is guaranteed (on the stack)
- can enforce legal value ranges via access specifiers (public / private protected)
With a C-Struct, someone could easily just set denom=0
. However, this is not possible with our class: denom
is private, we could add logic to our constructor to reject denom=0
.
- This is an example of a invariant: something about a class or ADT that you always expect to be true
We gave Rational a default constructor - a constructor that can be called with no arguments. Should all ADTs have a default constructor?
- Not necessarily - only if it makes sense. Consider a student ADT with name, id, birthday fields - no sensible defaults.
If you do not write any constructors at all, you get a compiler provided default constructor:
- Primitive fields are uninitialized (garbage values)
- Default constructs object fields
Now, we don’t need 3 different constructors for a different number of arguments. We can just use default parameters!
We don’t need to write different Constructor for a different number of arguments. We can just use default parameters!
Important: Default parameters MUST be trailing.
Additionally, default parameters must only appear in the Declaration, not the Definition.
Using the MIL is considered better style than assigning the fields in the constructor body.
- Reference Fields
- Object Fields without Default Constructors
By Step 4: All object fields must be initialized. In not given a value via the MIL, default constructed.
Lecture 2
There are cases where using the MIL is necessary.
Const
fields
- Reference fields
- Object fields without default constructors
- If the superclass does not have a default constructor
What if we gave A a default constructor?
Correct way:
- This works even if A does not have a default constructor!
Now, let’s support rational operations!
To support this functionality, we need Overloading.
- Overloading is implementing two functions / methods with the same name, but differing the number or types of arguments provided
For example, the negate
function overloaded
To support cin >> r >> q
, where r,q
are Rationals. Define the follow operator overload:
- The istream is Passed by Reference because copying is disallowed for istreams.
- The rational is also Passed by Reference because we want changes to be reflected outside this function.
- We
return in
to allow for chaining- In this example,
cin >> r
is evaluated first cin >> r >> q
simplifies tocin >> q
- In this example,
Problem: r.num
and r.denom
are private fields, cannot be accessed in the operator >>
Solution 1: provide accessor methods
- provide methods
getNum
andgetDenom
which return references to the num anddenom
fields - sometimes paired with mutator methods
setNum
,setDenom
. - i.e. getters and setters
Solution 2: Declare operator >>
as a friend function
- the
friend
gives permission for the function to access any private fields + methods of Rational
Now, support p = q+r
, each variable is Rational
So we need to overload the +
operator
Notice that we take in args via constant reference:
- constant - don’t want lhs or rhs to change
- reference - quick no copying
Declaring all these overloads as friends is a pain.
Alternative: Define operator overloads as methods in Rational:
- Essentially,
r + q
is semantically equivalent tor.operator+(q)
In python, we have r.__add__(q)
, see Python
Note
Operator
<<
and>>
are usually defined as standalone functions (outside of the class). Becausecin
andcout
appears on the lhs.
What if we want r + 5
? Answer: Function Overloading
What about 5 + r
? doesn’t work, because order of arguments matters.
unless we had the following:
- Note that this is a standalone function, defined outside of the class.
What about p = q + r
?
Compiler provides a Copy Assignment Operator for you. We can also write our own.
We can chain the = operator:
Lecture 3
There are some operator overloads that MUST be defined as methods. These are:
operator=
operator[]
operator->
operator()
operator T
whereT
is a type
Supporting division of Rationals, e.g. q/r.
Exercise: Supporting cout << q << endl
where q
is a Rational
.
Do the following:
Usually, we don’t provide endl
in the operator <<
definition, because we don’t want to force our users to use new lines.
Consider:
Our operator <<
and operator >>
definitions also work for reading from and printing to files (and other types of streams).
Finally, Rational z{q}; or Rational z = q; or Rational z(q);
- Running the Copy Constructor
- Compiler provides a Copy Constructor that simply copies all the fields. We can also write our own.
Having written all these operator overloads, Rational ADT is easy to use from a client perspective.
Takeaways
- Overloading operators gives us a common syntax
- Classes allow us to enforce invariants, good ADT design
- Friends can be used (sparingly) to give access to private fields
Value Categories Every expression in C++ has both a type and a Value Category.
We will discuss 2 categories: lvalues and rvalues
lvalue
An lvalue is any expression that you can take the address of.
For example:
- In the above, this expression
x
- we can take its address, therefore it is an lvalue.
rvalue
An rvalue is a temporary value, it will be destroyed “soon”.
5
is an rvalue, as we cannot take the address of5
.
Another example
- this expression
f()
is an rvalue. The returned result off
only exists until the end of the line
We cannot run &f()
- the string isn’t stored anywhere permanently, just a temporary until it is saved into s
The references we have seen so far are lvalue references. These can only bind to expressions that are lvalues.
For example:
However,
An exception: We can bind rvalues to const lvalue references.
We can also create rvalue references. Extend the lifetime of the rvalue to the lifetime of the reference.
- We can use the temporary value returned by
f
for as long ass
exists.
Most commonly used for overloading functions based on the value category of the expression.
Why is this useful? We’ll see shortly.
Finally, note that type and value categories are independent properties.
- Here, although
s
references an rvalue, we can take s’s addresss
is an lvalue.
Linked List ADT
A slightly more complicated ADT example.
Specifically, I want to leverage lvalue / rvalue knowledge proficiency. We’ll focus on invariants + encapsulation later. For now, we’ll focus on correctness and efficiency.
I want the following client node:
First, constructor:
Next: Node p{n}
this executes the copy constructor - provided by compiler. Copies each field.
Now, p.next->data = "z"
modifies n
as well. We have shared data. We’ll define our own copy constructor that recursively copies the data structure.
- This is a deep copy, as opposed to a shallow copy.
Lecture 4
This time: Linked List ADT continued.
Move Constructor + Move Assignment Operator. Elision. Encapsulation.
Issue: Although n
is freed, the rest of the Nodes in the list are on the heap, must also be freed.
We will write a destructor, which runs when stack allocated objects go out of scope, and when heap allocated objects are freed.
Complaint: Efficiency.
In memory, looks like the following:
- Here, we’ve copied 26 nodes, then delete the originals immediately
We can observe that getAlphabet()
is a rvalue. It’s going to be deleted by the next line. No one else will use that memory. We can steal it.
We’ll write a Move Constructor for rvalues, which will be faster.
One more slight optimization:
other.data
has an address, its an lvalue. This invokes the copy constructor for the data string. Even thoughother.data
will die soon. We’d prefer to run the string move constructor.
std::move
is included in<utility>
, forces an lvalue to be treated like an rvalue. Move Constructor runs.
We still have data sharing:
Compiler provided assignment operator does shallow copies of pointers. Data sharing again, leaked node.
We’ll implement our own copy assignment operator.
Attempt 1:
Example of when the above doesn’t work:
To protect against self-assignment, add the following to the beginning of the assignment operator.
We’ll rearrange th copy assignment operator slightly. new
has the possibility to reject our request for more memory. We’ll then quit the function.
Call new
first so if it fails, we haven’t deleted anything yet.
Working copy assignment operator:
There is some code duplication between copy constructor and copy assignment operator. Can address this using the Copy-and-Swap Idiom:
Exercise: check what happens with self-assignment.
Finally: Move Assignment Operator for more efficiency.
Exercise: write this without swapping.
If you don’t define a move constructor / assignment operator but you do define a copy constructor / assignment op, compiler will use copies in all scenarios.
Rule of Five: If you write one of the following, you should usually write all: Destructor, Copy Constructor, Copy Assignment Operator, Move Constructor, Move Assignment Operator.
Elision
In g++, std=c++14
, neither move nor copy gets called. This is called Elision.
In certain cases, compiler can skip calling copy / more constructers, instead writing the object’s value s directly into its final location.
Another example
Here, {1,5}
is directly written into r
.
Note: Elision is possible, even if it changes program output.
- Not expected to know all possible cases, just that it’s possible.
Disable with the flag -fno-elide-constructors
(slows down program though)
Lecture 5
Forgery:
0x2117
is a completely random memory address- Most likely, a segmentation fault on any kind of use.
Tampering:
All these issues are from a common problem: Representation Exposure.
Every ADT has a representation in memory somehow. For example, linked lists are implemented via a series of nodes
If the client gains access to the memory, we call that Representation Exposure. Client can use this knowledge to do things we don’t expect.
We have expectations / rules for all linked lists:
- All nodes are heap allocated (or next is nullptr)
- No sharing between separated lists
- No cycles in our list
Clients should not need to uphold these invariants, its our responsibility.
Solution: Encapsulation. We provide public methods for the client to interact with our List.
This solves our representation exposure problem. Client has no knowledge of Nodes, no access to internal memory representation, no tampering / forgery allowed.
Lists are slow! th is a not a constant time operation. Takes steps to find the -th node in the list.
Total time is , so
We can’t just give clients access to Nodes again, because we would get Representation Exposure.
Design Patterns: Effective solutions to common problems
Problem: Efficient looping over a data structure while maintaining encapsulation.
Pattern: Iterator Design Pattern: provide an abstraction for pointers.
Example:
Our client’s use of an iterator will look like:
These are the functions we need to implement:
- for
List
:begin
andend
functions that return iterators - for
List::Iterator
:!=
to compare++
to go to the next node*
to get the string at current Node.
Implementation
post-fix vs prefix ++
The teacher uses specifically prefix ++ so that the function overloaded is
Iterator& operator++()
. If we wanted it for post-fix ++, we would doIterator& operator++(int i)
(yes disgusting)
If you implement an iterator like this, we can use the range-based for loop syntax:
- When there is no reference (no
&
sign), it creates a copy fromcur->data
intos
; - Under the hood, it calls
l.begin()
andl.end()
to iterate through
can also use auto
for type deduction:
Note: auto
drops references, so if you want to modify l, use auto&
.
Iterator is a public nested class, with a public constructor.
Worry of Forgery: auto it = List::Iterator {nullptr};
This could be a problem for other kinds of Iterators where nullptr is not used.
Solution: Give iterator a private constructor.
Warning
In general: be cautious when using friends, weakens Encapsulation. Only make friend (C++)s if they can do something for you!
Subtle issue:
Doesn’t compile! How should we know that begin/end won’t modify the fields of this constant l
?
Solution: Declare begin/end as const methods, promise we won’t modify fields.
- a constant object (or const ref to an object) may only have
const
methods called on it. - Now, it compiles! But it is actually BAD
We can still modify a const List&
by using its iterator.
In next lecture, we’ll talk about physical constness vs. logical constness.
Lecture 6
We saw that we could modify a const List&
by using its iterator.
This is due to the difference between physical and logical constness.
- Physical constness is about a particular object’s implementation. Compiler guarantees this for const objects/const refs. Ensures the fields won’t change.
- Logical constness asks not only for the fields to be immutable, but also that the memory associated with the representation of the ADT doesn’t change.
How to achieve logical constness?
begin
andend
cannot be cast methods, they return Iterators that return modifiable string refs viaoperator*
We’ll write additional methods cbegin
/cend
which return const Iterators, which will only provide a cast string&
when calling operator*
.
Now, we cannot call begin
/ end
on a const List&
. Can only call cbegin
/ cend
, these don’t allow us to modify the list.
Complaint: duplication between two iterators. Fix this using a C++ Template.
Template classes are allowed to be parametrized via the type provided.
Classic example: C++ Standard Template Library.
vector<int>
vs.vector<string>
is implemented using Template (C++).
Specify the return type of operator*
as a template parameter, rest of the code is the same.
Note: Template definition of methods must be written in the header file, because the template needs to know the next field exists.
how do template work? (briefly)
For each use of
MyIterator<T>
, compiler generates another copy of the class withT
substituted where necessary. Afterwards, compiles as usual.No runtime cost using templates, it is as if we wrote our own
Iterator
/ConstIterator
.
Separate Compilations
In List.h
(original version), we provided a forward declaration of struct Node
. Provide definition of the class in the .cc
file.
This is especially useful for separate compilation.
-c
compiler flag lets us produce an object file (usually ends with.o
)
Think of object files as containing the machine instructions for that code (ahhh we saw this in CS241E with Linking).
Use object files to store the result of compilation, and reuse it if the .cc files haven’t changed. With many files, significant speedup while developing. We prefer to put implementations in .cc
file for the following reasons:
List.cc
changes recompileList.cc
intoList.o
, Link object files together to get an executable.List.h
changes all.cc
files that includeList.h
must recompileA.h
includesList.h
any.cc
files includingA.h
must change
Relink all .o
files together.
We prefer forward declarations of classes where possible - minimizes recompilation.
When can we simply forward declare, and when do we need to use
#include
?See the example below.
B
andC
require#include
in order to determine their size and compile them.D
all pointers are the same size, just use a forward declarationE
we don’t need to know the size ofA
, just thatA
exists for type-checking purposesF
must#include
to know thatdoMethod
exists
Lecture 7
This files we gave in A1 looked like:
- The above are Preprocessor directives - commands that allow transformations of the source code before it is compiled.
#include "file"
- replaces this line with the contents of file.
Consider an example: Linear Algebra modules.
main.cc
includes vec.h
main.cc
includes matrix.h
, includes vec.h
- two definition of
struct vec
inmain.cc
, not allowed
To fix this issue of multiple definitions, use “include guards”:
#ifndef VARIABLE
: The code between#ifndef
and#endif
will only be seen by the compiler ifVARIABLE
is not defined.#define VARIABLE
defines a preprocessor variable.- Include Guard
Works because once File.h
is included once, the variable becomes defined, so in all future includes
the block is omitted.
This doesn't fix all issues
There is an issue with circular dependencies. Consider the following example:
The issue is that each class needs to know the other exists before it can be created.
Solution: Break circular dependencies chain using forward declarations.
Separate compilation
Speeds up compilation and development time: change one .cc
file → update the .o
file, relink with other old .o
files.
Issue: Change a .h
file, then many .cc
files might need to be recompiled. Mental energy to figure out the dependencies and just recompile the relevant .cc
files. Might be faster to just recompile everything.
Solution: Use a build automation system. Keep track of what files have changed, keep track of dependencies in compilation, just recompiles the minimal # of files to make a new executable.
We’ll discuss make: Create a Makefile
Example: main.cc
(includes List.h
), List.h
, List.cc
(includes List.h
)
Makefile (v1)
- General format for first line is
target: dependencies
g++ main.o List.o -o myprogram
is called the recipe
This text is in a Makefile in our directory.
make → creates the first target
Looks at the last modified time of dependencies.
If last modified time is newer for a dependency than a target => target is out of date, recreate it.
Still too much work! Still requires lots of updates to our makefile.
Makefile (V2)
CXX = g++
CXXFLAGS = -std=c++14 -Wall -g -MMD
EXEC = myprogram
CCFILES = $(wildcard *.cc)
OBJECTS = ${CCFILES:.cc=.o}
DEPENDS = ${CCFILES:.cc=.d}
${EXEC}: ${OBJECTS}
${CXX} ${CXXFLAGS} ${OBJECTS} -o ${EXEC}
-include ${DEPENDS}
For CXXFLAGS
-Wall
for more warnings-g
for extra debugging support-MMD
to include${DEPENDS}
include ${DEPENDS}
compiles all object files with dependencies usingCXX
andCXXFLAGS
Software Development lifecycle:
- clients specs program build test/debug it
Program uses source control and is released.
Lecture 8
Agile Method Client specs planning programming building testing/debugging source control releasing client (goes in a loop)
- Client is looped into the process.
- Typically, work is done in 1-2 week “sprints”
- Basically, no one uses Waterfall Method
Next: Testing / Debugging Debugging in C++: Valgrind and gdb
Valgrind
Valgrind’s primary use is as a memory checking tool. It provides a virtual CPU that it executes your program on - checks for memory access, etc.
Typically programs executed through valgrind are 3-4x slower than usual (don’t use it for performance monitoring).
Important
It’s important when using valgrind or gdb to compile with the
-g
flag.
Other important information:
- Using
--leak-check=full
allows you to see where leaked memory was allocated - Valgrind can only inform on memory leaks for one particular invocation of the program
- Highlights the importance of designing various test cases that hit each part of the program.
Can also detect invalid reads: report the line of your source code where the invalid read happened.
Invalid write - we get 3 pieces of info:
- where the invalid write occurred
- where the memory was freed
- where the memory was initially allocated
Can also detect invalid uses of delete
/ delete[]
.
GDB
gdb = GNU Debugger: Useful for inspecting variables, seeing control flow of program.
gdb ./myProgram
run - runs the program
runa arg1 arg2 arg3
run < test8.in
break file: line-number
break fn-name
- set a breakpoint, allows us to stop program at this point
next
- runs one line of the program
layout src
- nicer display for source code
print variable
- allows you to see value of that variable at that time
refresh
- fix the layout src display
step
- runs one instruction of the program
list
- show surrounding lines of the pgoram
backtrace
- lists the sequence of function calls that got you to your current location
up/down
- change the function in the call stack we are observing so as to inspect other variables
set var
- allows us to set a variable at runtime
continue
- runs the program until the next breakpoint
display var
- prints the value of var after each next/step
undipslay 1
- stops displaying first var set with display
watch var
- breaks the program whenever var is changed
delete breakpoint number
- rename break point from use (watch and break)
Lecture 9
- Source control: manage source code for your program (version1, version2)
- Wants: collaborate with others, have a version history
- historically people used SUN, subversion, mercurial
- Most people nowadays just use Git
- Create a git repository:
git init
from within a directory - git has a staging area, prepare what files which parts will be committed
- commit unit of work - understanding of how ideas/data structures developed over time, commit early and often to see
- a commit = a unit of work - you should commit early and commit often
Basic Commands
git add
- allows you to add changed files to staging areagit status
- lets us see which branch we’re on, which files / changes will be committed, etc.git add -p
lets you stage interactivelygit diff
to see the difference between all (unstaged AND staged) files and the most recent commitgit diff --staged
to see the difference between staged files and the most recent commit
One thing we may want to do is develop features or work on refactors separately form other development.
git branch branchName
git checkout branchName
to swatch between branches
Example: Let’s say we’re on master branch.
git merge branchName
creates a new commit with parents that are the latest commits on the current branch and branchName.git merge feature
git log
- shows a history of commitsgit push
- send work to a remote branchgit pull
- pulls work from remote branch, merges as well.
System Modeling
Goal: visualize classes and the relationships between them at a high level
Popular standard: UML
Modeling class relationships
Composition, “owns-a” relationship
If A owns-a B, then generally:
- B does not have an independent existence for A
- If A dies, then B dies
- If A is copied, B is copied as well (deep copy)
Typically, this is implemented in C++ via object fields.
Aggregation, “has-a” relationship If A has-a B, then generally:
B
has an independent existence outside ofA
- If
A
dies,B
lives on - If
A
is copied,B
isn’t (shallow copy)
Generally implemented via references or non-owning pointers (one you don’t delete).
If a student dies, university lives on. if we copy a student, don’t copy whole university.
Specialization: “is-a” relationship.
- One class is a “version” of another class
- Typically subclassing or subtyping relationship
- If B is-a A, then we should be able to substitute something of type B, wherever we expect something of type A.
In C++, achieve this via public Inheritance.
Why Book{title, author, length}
?
Recall the 4 steps of object construction (from Constructor)
- Space is allocated
- Call the superclass constructor
- Initialize fields via a member initialization list (MIL)
- Run the constructor body
We cannot set title, author, length because they are private to book. It will use Books default constructor, won’t compile if default constructor does not exist.
Let’s discuss Virtual Method and Overriding.
Virtual
keyword is used for achieving dynamic dispatch for polymorphic types.
Dynamic Dispatch: choosing which function to run at run-time as opposed to compile time.
Consider the following
b
has two types associated with it. See Static vs. Dynamic Type of Pointer
- static type:
b
is aBook*
, compiler knows this. - Dynamic type: describes what type
b
is pointing to at run-time. May depend on user-input, cannot be determined by the compiler
Lecture 10
How do we know which method call in the hierarchy is invoked for b.isHeavy()
or b->isHeavy()
? Notes copied into Virtual Method.
- Is the method being called on an object? If so, always use the static type to determine which method is called.
- Is the method called via pointer or reference? a) Is the method NOT declared as virtual? - Use the static type to determine which method to call.
b) Is the method virtual? Use the dynamic type to determine which method is called.
We can support
Each iteration calls a different isHeavy
method.
What about
(*book).isHeavy()
?
(*book).isHeavy()
calls the correct method as well, because*book
yields aBook&
, i.e. a reference.
What is the purpose of the override keyword?
- It has no effect on the executable that is created.
However, can be helpful for catching bugs.
Specifying override will have have the compiler warn you if the signature does not match a superclass’ virtual method.
Why not just declare everything as virtual for simplicity?
Consider the example below.
- Declaring
doSomething
as virtual doubles the size of our vec object, program consumes more RAM, slower in general. - This extra 8 bytes is storing the vptr - virtual pointer. vptr allows us to achieve dynamic dispatch with virtual functions.
Remember, in MIPS, function calls use the JALR instruction. Saves a register, jumps PC to a specific function’s memory address, hardcoded in that machine instruction.
With dynamic dispatch, which function to jump to could depend on user input. Cannot be hardcoded.
how the virtual function is actually decided
When we create a vec2 or vec3, we know what type of object we’re creating, so we can fill in the appropriate vptr for that object.
Now, in either case, we can simply follow the vptr, get to the vtable, and find the function address for the doSomething method.
Extra running time cost in the time it takes to follow the vptr and access the vtable.
C++ Philosophy: Don’t pay for costs unless you ask for it.
Destructors revisited
Which of these leaks memory?
Object destruction sequence:
- Destructor body runs
- Object fields have their destructors run in reverse declaration order
- Superclass destructor runs
- Space is reclaimed
Because the destructor is non-virtual, for pxy
, we invoke ~X
, not the ~Y
, so this array b
is leaked.
Solution: declare virtual ~X()
, so delete pxy
calls ~Y()
.
Unless you are sure a class will never be subclassed, then always declare your destructors as Virtual Destructors.
If you are sure, enforce it via the final keyword.
Lecture 11
- ooh this is interesting
If we do not provide an implementation for Shape::getArea()
, code won’t link: “undefined reference to vtable error”
We could make Shape::getArea()
return 0 or -1 to indicate “no area”, but it’s not natural. Really, we want to avoid a definition for Shape::getArea()
entirely.
Solution: Declare Shape::getArea
as a Pure Virtual Method. A pure virtual function is allowed to not have an implementation.
Classes that declare a pure virtual function are called Abstract Class. Abstract classes cannot be instantiated as objects.
A class that overrides all of its parents pure virtual functions is called a Concrete Class. Concrete classes can be instantiated.
Purpose of abstract classes is to provide a framework to organize and define subclasses.
What does f
expect:
a[0]. a[1] ...
[x, y, x, y, ]
What it actually looks like:
a[0]. a[1] ...
[7, 8, 9, 10, 5, 6]
Data ends up misaligned. All of Vec2{7,8}
is written into myArray[0]
. Half of Vec2{9,10}
is written into myArray[0]
, the other half is in myArray[1]
.
Lesson
Be very careful when using arrays of objects polymorphically. Solutions:
- Use an array of pointers:
Vec3* myArray[2]
- Use a vector of
Vec3*
How does an array of pointers fix the issue?
?
Polymorphic Big Five
Let’s consider Book
hierarchy again.
Compiler still provides us a copy constructor, that works as expected.
Let’s look at copy/move constructor and assignment operator too. See their definition:
- Calls the copy constructor for the book pattern of Text.
t
is aconst Text&
, this is implicitly converted to aconst Book&
.
Move Constructor:
t
and t.topic
are lvalues, so we’d invoke the copy ctor if we didn’t use std::move
.
t is an rvalue reference so we know it is safe to steal title, author, length and topic via using std::move to invoke move constructors.
Copy Assignment:
Move Assignment:
All of these implementations are what the compiler gives by default. Customize as necessary - for example, if doing manual memory management, you will need to write your own versions of these.
BUT: Are the compiler provided definitions actually that good?
- Title, author and length are set, but topic remains unchanged.
Book::operator=
is defined as non-virtual. We’re calling operator=
method on a reference. We use the static type, call Book::operator=
, even though br1
and br2
are referencing texts.
Some fields are copied, but not all. This is the partial assignment problem.
How to fix this? Declare Book::operator=
as virtual!
Our usual signature for Text
is the following
Can we just slap an override
on the end of this? NO: Signatures don’t match in 2 places: return type, parameter type.
- Return type: this is actually okay: A subclass’s override method can return a subclass of the virtual function’s return type (if it’s a pointer or reference)
- Parameter type for overridden functions must match exactly.
Signature must be the following
Problem 1) can’t access other’s topic, because it’s a Book
, and Book
don’t have topics, only Texts
do.
Problem 2) other is a Book&
, so now this is legal:
We can set a Text
to a Comic
, Comic on RHS can be implicitly converted to a const Book&
.
This is the mixed assignment problem, which is where you can set subclass siblings to each other.
Non-virtual operator = leads to partial assignment virtual operator = mixed assignment
To fix this, restructure the Book
hierarchy:
Lecture 12
To make this abstract, we need a Pure Virtual Method. If no other methods make sense to be pure virtual, can always use destructor.
How does this fix the two problems?
- Mixed assignment:
operator=
is non-virtual and the implicitly provided copy assignment operator only acceptsText
. - Partial assignment problem:
Note: Only works since AbstractBook
is an abstract class.
If we set Book::operator=
to protected then we couldn’t assign books to one another.
Consider Text
’s destructor. Implicitly, following happens:
- Destructor body runs (empty)
- Object fields we destructed in reverse decl. order
- Superclass destructor runs
- Space is reclaimed
Because in step 3, Text
’s destructor calls AbstractBook’s destructor, we have a problem: we’ve called a method with no implementation.
Solution: give it an implementation:
Pure Virtual Methods can STILL be implemented in the base class
Pure virtual methods don’t have to be implemented, but they still can be. They require an implementation if they will be called.
”= 0 means derived classes must provide an implementation, not that the base class cannot provide an implementation”. See this StackOverflow thread.
AbstractBook
is still abstract, only subclasses of classes which define a pure virtual method may be concrete.
One possible recommendation: If you care about assignment, consider making your superclasses abstract.
Error Handling
Example from STL vector.
Vectors: dynamically allocated resizing arrays. Handles memory management so we don’t screw it up.
Unfortunately, can’t idiot proof everything.
How to handle errors:
Option 1: Sentinel values. Reserve some values, -1
, INT_MIN
to signal errors
- Problem: reduces what we can return, can’t return -1 in a regular scenario. Not clear for a general type T what values we should pick as sentinels.
Option 2: global variables. Create some global variable that is set when an error occurs (in C,
int errno
, which can be queried for errors with standard functions). - Also not idea: limited # of errors, might be overwritten Option 3: Bundle in a struct:
Best so far, but still not ideal. Wrap our return types in this struct
, all return types are larger than needed. Awkward to access data field.
These are all approaches that C
users end up using. C++ however has a language feature for dealing with errors: Exceptions.
V.at(100)
fetches V[100]
if the value exists, otherwise, throws an exception.
r
is just an object, class type isstd::our_of_range
(included in<stdexcept>
)- The
.what()
method returns a string describing the exception.
Force the programmer to deal with the error because the control flow jumps.
Vector knows the error happened but not how to fix it.
We know how to fix it, but not how the error occurred
- non-locality of error handling
To raise an exception ourself, we use the throw
keyword. We can throw any value, but keep in mind that <stdexcept>
has objects for common scenarios like out_of_range
, logic_error
, invalid_argument
.
When an exception is raised, control flow steps. We search through the stack upwards looking for a handler for this type of exception. This is called Stack Unwinding. Destructors are run for objects stored on the stack during the process of stack unwinding. If a handler is found, we jump to that point. If no handler is found, program crashes.
Example:
- Main calls
h
,h
callsq
,q
callsf
, throws, stack unwinding throughq
,h
, jump to catch block in main.
Multiple errors may be handled via multiple catch blocks.
One handler can also deal with part of an error, rethrow the exception to allow someone else to deal with it.
Lecture 13
In the last lecture, we saw that we can throw different types of exceptions from catch
blocks. We can also rethrow the same exception for someone else to deal with.
Why here do I just say throw
rather than throw e
?
throw e
performs a copy in order to throw this exception- Remember that copy constructors (any type of constructor) cannot be virtual. Therefore, the static type is used for the copy
If you throw a range_error
and catch via std::exception&
catch block,
throw
rethrowsrange_error
throw e
catch astd::exception
copied from therange_error
, we lose the dynamic type.
Generally, catch blocks should catch by reference to avoid avoid copies.
Never let a destructor throw an Exception!
Default behavior: Program immediately crashes.
noexcept is a tag associated with methods (like const), which states the method will never raise an exception, destructors are implicitly noexcept tagged.
We can allow exceptions thrown from destructors by tagging them noexcept (false)
.
For example,
Danger
If we throw an exception, Stack Unwinding occurs, during this process destructors are running for stack allocated objects. If one of these destructors throws, now we have 2 active exceptions!
- 2 active exceptions = program crash (this behavior cannot be changed)
Exception Safety
Under normal circumstances, f
does not leak memory. But, if g
throws, then we do not execute delete p
, so we do leak memory!
Thing to recognize: exceptions significantly change control flow! We no longer have the guarantee of sequential execution.
Let’s fix f
, so we can handle its memory leak.
Complaints about the fix above:
- Repeated code.
delete p
twice, little annoying - Could get complicated for many pointers, many function calls, etc.
In other languages, some have “finally” clause, which always runs, either after a successful try, a successful catch, or before a catch returns. Unfortunately, C++ doesn’t have support for this.
- The only guarantee is that during Stack Unwinding, stack allocated objects have their destructors run.
- Therefore, use stack allocated objects instead no leak
What if we need a pointer? For example, to achieve Polymorphism
One solution: wrap the pointer in a stack allocated object that will delete it for us during stack unwinding.
C++ Idiom: RAII (resource acquisition is initialization)
Example:
Resource (file handler) is acquired in the constructor (initialization). Resource is freed (closing the file) in the destructor.
Apply this RAII concept to dynamic memory.
std::unique_ptr<T>
(in <memory>
library), i.e. Unique Pointers
- contains a
T*
passed via the constructor - deletes the
T*
in the destructor
If we leave f
normally, or if we leave f
during stack unwinding, either way, p
’s destructor runs, deletes heap memory.
In between, we can use p
like a regular pointer thanks to operator overload.
Generally, we do not call the constructor directly with a pointer, because of the following issues:
std::unique_ptr<T> p{new T{}};
new is not paired with a delete (not one that we see, at least)- Causes a double delete:
g()
can potentially throw in the example below, heap-allocated object does not get deleted
One potential ordering (in C++14) (obscure scenario)
new T()
g()
unique_ptr
constructorf()
Preferred alternative: std::make_unique<T>
(…)
- Constructs a
T
object in the heap with arguments …, and returns aunique_ptr<T>
pointing at this object.
Something else to consider:
Call copy constructor to copy p
into q
.
What happens? Doesn’t compile. Copying is disabled for unique_ptr
s. They can only be moved.
This is achieved by setting copy constructor / assignment operator to = delete
.
unique_ptr
are good for representing ownership, since when one object dies, its unique_ptr
fields will run and clean up the associated object.
Has-a relationship: Use raw pointers or references.
- You can access the underlying raw pointer of a smart pointer via
.get()
.
Lecture 14
What if we want shared ownership, where some memory is deleted only if all the pointers pointing to it are deleted?
We can achieve this via std::shared_ptr<T>
, i.e. Shared Pointers.
shared_ptr
maintains a reference count, which keeps track of how many shared pointers point at this memory. When it hits 0, memory is freed.
There are still problems with shared_ptr
s:
- Susceptible if you use the constructor
q
has no knowledge ofp
’s existence
- Cyclical reference issue
- If we have multiple shared ptrs that point at each other, we have the issue that some pointers may never be deleted
full example through ChatGPT
In general, shared ownership is somewhat rare. Take care in constructing your programs. Usually, it suffices to use the following:
unique_ptr
or object field - ownership / Composition- raw pointer or reference - Aggregation (“has-a”)
shared_ptr
- shared ownership
Exercise: Implement shared_ptr
.
We’ll come back to exception safety later.
- Provide some standard “good” solution to a common problem
- One common problem: adding / removing behavior at run-time with a polymorphic class.
- One example: Windowing system - GUI.
If we attempt to make a class for each possible feature, for features, there are possible configurations. Combinatorial explosion!
Solution: Decorator Pattern (Linked List of functionalities)
Pieces of the decorator pattern:
- Abstract Component: Gives the interface for our classes
- Concrete Component: Implements the “basic” version of the interface
- Abstract Decorator: Organizes decorators and has-a decorator or the concrete component.
- Concrete Decorators: Implement the interface, call
operation()
on the next object in this linked list.
ScrollBar Tabbing Basic Window
Example: Pizza
StuffedCrust Topping Topping CrustAndSauce
Lecture 15
Observer Pattern: Publisher / Subscriber relationship.
Publisher: Generates some data, change in state Subscriber: Can dynamically subscribe or unsubscribe for various publishers. Should react when data is changed.
Example: Spreadsheet application:
- Publishers: Cells (Subjects)
- Subscribers: Charts (Observers)
When a cell changes, charts that are observing that cell must re-render, display new information.
May have different types of publishers / subscribers, e.g. - different charts require different logic for re-rendering.
Steps
ConcreteSubject
has its state updatednotifyObservers
is called - either byConcreteSubject
or some controller (like themain
function)- notify is called on each observer in Subject’s observer list
- This calls
ConcreteSubject::notify
, by the assumption it is pure virtual ConcreteObserver
callsgetState
onConcreteSubject
, uses info as necessary
Example: Twitter clone. Tweeters are subjects. Followers which are observers, can only follow one tweeter.
Important Note
Notice that this only works because destruction happens in reverse order!
Wait, I’m slightly confused, why does it work? Why doesn’t the program leak memory here? Ahh, I think if you had something like declaring
Follower
first, then declaringTweeter
, you would get memory leak, because theTweeter
would get destroyed first. But that is not the case here,Follower
s gets destroyed first, thenTweeter
. See Destructor.
Also note: The design patterns we demonstrate here are simplistic implementations. Only get value at scale - multiple types of subjects / observers.
Other variants also exist - for example, passing a Subject&
via notify method.
Casting allows us to take one type, treat as another.
In C:
This is dangerous, generally avoid casting unless necessary, subverts type system. If necessary, use one of the 4 C++ casts instead of C-style casting.
static_cast
- “well-defined” conversions between two types
g
callsint
version ofg
Another example:
- “trust me bro, I know it’s a Text”
- Undefined behavior if it’s not actually pointing at a
Text
reinterpret_cast
- allows for poorly defined implementation dependent casts. Most uses ofreinterpret_cast
are undefined behavior.
Lecture 16
const_cast
- the only type of cast that can “remove” constness. Example: using some library that gives us:
Let’s say we know g
doesn’t modify the int
pointed to by p
in any way. Also, I have a const int*
I’d like to call g
with. Compiler will prevent us from calling g
, because it might modify p
.
I can use const_cast
to call g
in the following way:
Generally, const_cast
should be avoided.
Another example, working on legacy codebase that doesn’t use const anywhere. We want to add const
s, make our program const-correct.
Issue: const-poisoning. Adding const in one location often means we must add it to other locations to allow it to continue to compile.
We can use const-cast to bridge between const-correct and non-const-correct parts of our program. Make small independent parts of the program const-correct, use const-cast to allow the program to compile as the work is done.
dynamic_cast
: used for safely casting between pointers/references in an inheritance hierarchy
Only safe if pb
actually points to a Text
.
Instead,
if the cast succeeds (i.e. dynamic type is Text
), then pt
points at the Text
. Otherwise, pt
is set to nullptr
.
Also can be used on references.
What if br
is actually referencing a Comic
? Cannot set it to null, no such thing as a null reference.
If dynamic_cast
fails with a reference: throw a std::bad_cast
exception.
There exist smart pointer versions of each of these casts:
Cast shared_ptrs
to other shared_ptrs
.
Those allow us to make decisions based on RTTI (Run-Time Type Information).
BAD
The above function is poor object-oriented design. Don’t do this.
Also Note: dynamic_cast
only works if you have at least one virtual method.
Recall: polymorphic assignment problem. We considered making operator=
virtual.
- operator= non-virtual: partial assignment
- operator= virtual: mixed assignment
Let’s make operator=
virtual in Book
.
Measures of Design Quality
Question: How can we evaluate the quality of our code - what is good, what is bad, when we’re not just following a particular design pattern?
We’ll discuss a number of ways:
- code smells - refactoring
- coupling / cohesion - how do my classes interact?
- SOLID design principles
Coupling: Description of how strongly different modules depend on one another.
Low Coupling:
- Function calls with primitive args / results
- Function calls with structs / arrays as results
- Modules affect each other’s control flow
- Modules sharing global data
High Coupling:
- Modules have access to each other’s implementation (friend)
Ideally, we desire low coupling:
- easier to reason about our program
- easier to make changes
We can cheat: If I put everything in one class, super low coupling!
Counterbalancing force: Cohesion - how closely do the parts of my module relate to one another?
Low Cohesion:
- parts of module are completely unrelated, e.g.
<utility>
module has a unifying common theme, e.g. <algorithm>
Elements manipulate state over lifetime of an object, e.g.<fstream>
High Cohesion:
- Elements are cooperating to perform exactly on task
Cheat at Cohesion:
- put every function in its own class, then they each do only one thing!
The Goal
We strive for: low coupling, high cohesion.
Ex: whatIsIt
function - exhibits high coupling with the Book
hierarchy.
Any new classes in Book
hierarchy necessitate changes to this function.
SOLID Design Principles Acronym of 5 principles: purpose is to reduce software rot (the property that long lasting codebases become more difficult to maintain)
- Single Responsibility Principle
- Open-Closed Principle
- Liskov Substitution Principle
- Interface Segregation Principle
- Dependency Inversion Principle
Single Responsibility Principle “A class should only have one reason to change”.
i.e. a class should do one thing, not several.
A change to a program spec requires a change to the source code.
If changes to different parts of the spec require changes in the same class, then SRP is violated.
Example: Consider ChessBoard
Class.
This is actually somewhat bad design. ChessBoard
should not print to the screen.
What if I instead want to print to a file? Cannot use the same class without replacing cout
s with printing to a file.
Little more flexible, but still has issues.
What if we want a graphic display? Or to change the language, or to communicate over the internet?
We have low cohesion, violating SRP. ChessBoard is doing multiple things: manipulate state, implement logic, control rendering to the screen.
Instead: chessBoard should communicate via results, parameters, exceptions. Allow other classes to handle communication with the user.
Lecture 17
Should main be performing communication?
No - limits reusability.
Another possibility: Use the MVC architecture, well-suited for SRP.
MVC: Model-View-Controller.
- Model: Handles logic + data
- View: Handles program output / display
- Controller: Handles input, facilitates control flow between classes
Model: May have multiple views, or just one - different types of views: text, graphical view, etc.
- Model doesn’t need to know anything about the state / implementation of the views
- Sometimes we structure interactions between Model and View as an Observer Pattern
Controller: mediates control flow.
- May encapsulate things like turn-taking or some portion of the game rules.
- Some of this logic may go in the model instead: judgement call
- May communicate with user for input (sometimes done by the View)
By decoupling presentation, input, and data/logic, we follow SRP, and promote re-use in our programs.
On the other hand, there may be parts of the specifications that are unlikely to change, and so adding layers of abstraction just to follow SRP may not be worthwhile.
Avoiding needless complexity is also worthwhile - use your best judgement.
Classes should be open to extension, but closed to modification.
Changes to a program should occur primarily by writing new code, not changing code that already exists.
Example
- What if we wanted our
Hero
to use a different type of weapon?Bow
orMagicWand
? - Requires us changing all the places we referenced the
Sword
class - lots of modifications, not much extension.
Fix is to use Strategy Pattern:
Open-Closed Principle: Closely related to the concept of inheritance and virtual functions.
For example:
This function is open to extension: we can add more functionality by defining new types of Book
s, and closed to modification: countHeavy
never needs to change now that it has been written.
Compare this to WhatIsIt
:
- not open to extension: cannot get new behavior without modifying the source code
- not closed to modification either
Open/closed principle is an ideal - writing a program will require modifications at some point.
Plan for things that are most likely to change, account for those.
Enforces something discussed so far strictly:
- public inheritance should modael an is-a relationship
If class B
inherits from class A
: we can use pointers / references to B
objects in the place of pointers / references to A
objects: C++ gives us this.
Liskov Substitution is stronger: not only can we perform the substitution, but we should be able to do so at any place in the program, without affecting its correctness.
So precisely:
- If an invariance is true for class
A
, then it should also be true for classB
- If an invariant is true for method
A::f
, andB
overridesf
, then this invariant must be true forB::f
- If
B::f
overridesA::f
- If
A::f
has a preconditionP
and a postconditionQ
, thenB::f
should have a preconditionP'
such thatP=>P'
and a postconditionQ'
such thatQ'=>Q
B::f
needs a weaker precondition and a stronger postcondition
- If
Ex: Contravariance problem
- Happens whenever we have a binary operator where the other parameter is the same type as
*this
.
As we’ve seen before, we must take in the same parameter when overriding. C++ enforces this, which actually enforces LSP for us.
- A
Circle
is-aShape
- A
Shape
can be compared with otherShape
s - A
Circle
can be compared with any otherShape
To satisfy LSP, we must support comparison between different types of shapes.
typeidreturns
std::type_info` objects that can be compared.
dynamic_cast
: Is other
a Circle
, or a subclass of Circle
?
typeid
: Is other
exactly a Circle
?
typeid uses the dynamic-type so long as the class has at least one virtual method.
Ex: Is a Square
a Rectangle
?
4th Grader: Yes, a Square is a rectangle.
A square has all the properties of a rectangle.
Issue: we expect postcondition for Rectangle::setWith
to be that the width is set and nothing else changes. Violated by our Square
class.
Violates LSP, does not satisfy an is-a relationship. Conclusion: Square are not Rectangles.
Possibility: Restructure inheritance hierarchy to make sure LSP is respected.
In general: how can we prevent LSP violations?
- Restrain what our subclasses can do, only allows them to customize what is truly necessary.
We can use a design pattern: Template Method Pattern. We’ll provide a “structure” or “recipe” or “template” for what this method will do, allow subclasses to customize portions of this method by performing overrides.
General: When a subclass should be able to modify same parts of a method’s behavior, but not all.
Lecture 18
Template Method Design Pattern: In video game, I have two types of Turtle
s: RedTurtle
+ GreenTurtle
.
Note: draw
method is public and non-virtual.
- Nobody who subclasses
Turtle
can overridedraw
. GreenTurtle
andRedTurtle
can define draw, but it won’t be an override, won’t be called if we useTurtle
s polymorphically
Note as well: drawShell
is private, but we can still override it! Access specifiers only describe when methods can be called, not how they may be overwritten.
NVI - Non-Virtual Idiom
public virtual methods are trying to do two things at once:
- public: Providing an interface to the client
- will uphold invariants
- Respect pre-conditions / post-conditions
- expected to do its job
- virtual: providing an interface to subclasses
- Virtual methods are those that may be customized while an object is used polymorphically
- Provide “hooks” to allow customization of behaviour
Satisfying the responsibilities of “public” and “virtual” simultaneously is difficult
- How are we sure a public virtual method will satisfy its invariants, pre-conditions / post-conditions. Will it even do its job?
- What if we want to add more code to the public virtual method without changing the interface?
Non-Virtual Idiom states the following:
- All public methods are non-virtual
- All virtual methods should be private (or protected)
- Exception for the Destructor
Example: DigitalMedia
With NVI:
Now: We can add code that must run for all types of DigitalMedia before or after the doPlay call.
- we could copyright checking before
doPlay
, this will always run - Or: update a play count after
doPlay
- now subclasses don’t have to worry about maintaining this invariant
Flexible: can provide more “nodes” for customization simply by adding more private virtual method calls.
E.g.: showArt()
before doPlay()
- private virtual display poster for movie, album cover for song, etc.
All can be done without changing the public interface! Which is good for minimal recompilation, open / closed principle.
- In general, easier if we constraint what subclasses do from the beginning, as opposed to wresting back control. Supporting Liskov Substitution Principle
- Any decent compiler will optimize the extra function call out - so no cost at runtime
Interface Segregation Principle
- No code should depend on methods that it doesn’t use
- We prefer many small interfaces over one larger, more complicated interface
- If a class has many functionalities, each client of the class should only see the functionality that it needs
Example: Video Game (No NVI for simplicity)
Imagine we make a change to our drawing interface
battlefield.cc
must still recompile, even though it’s not using any part of the drawing interface- Needless coupling between UI and Battlefield via Enemy
One solution: Multiple Inheritance
Somewhat similar to the Adapter Pattern - used when a class provides an interface different from the one you need.
For example - Library
for our window - expects objects to inherit from “Renderable” and provide a “render” method.
Don’t want to change all of our inheritance hierarchy, or all our draw calls to render - violates open / closed, pain. Use an adapter class.
- Satisfy the
Renderable
interface by calling the methods we’ve already defined. - We might not even need our Adapter to provide this “draw” method anymore, just render. In which case, we could use private inheritance.
What is private inheritance?
Under protected inheritance: class B: protected A {...}
X
remains inaccessibleY
remains protectedZ
becomes protected: can only be accessed inB
and subclasses ofB
Under private inheritance: class B: private A {...}
or class B: A {...}
X
remains inaccessibleY
andZ
become private: Can only be be accessed inB
methods
Protected and private inheritance are not is-a (Specialization) relationship.
Lecture 19
As such, we cannot use classes with private or protected inheritance polymorphically.
Multiple inheritance can have some tricky semantics.
Ambiguous - it’s not clear whether A1::foo
or A2::foo
should be called - b.foo()
doesn’t compile. Disambiguate: b.A1::foo()
or b.A2::foo()
Question
Can you do this if
b
inherits from a single class, sayclass B: public A
, can you doB.A::foo()
?Although not needed, yes you can do this! It’s using the Scope Resolution Operator.
Another tricky aspect of multiple inheritance - shared base class. ”Diamond Problem”, “deadly diamond of death”.
Because a D
is-a B
and a C
, it ends up having 2 a
fields - one from the B
portion of the object and one from the C
portion of the object.
Must instead disambiguate - which a
field are we talking about? The one from the B
portion or the one from the C
portion?
What if we wanted to disable this (somewhat strange) behaviour, and have only copy a A in our hierarchy? Solution: Use Virtual Inheritance.
The keyword “virtual” indicates that this base class will be shared among all others who inherit from it virtually in the inheritance hierarchy.
Now, dOjb.a
unambiguous, a
field is now shared between both patterns of the object.
Constructors for virtual bases must run before any other constructors.
- This shows the order in which the objects is constructed for virtual inheritance
Fixed
What about object layout with multiple virtual inheritance???
B
object:
To treat a B*
like an A*
, simply create a pointer copy, points at the same location, ignore B
fields. Same strategy cannot be used for multiple virtual inheritance.
Note
virtual Base(A)
is actually stored at the end of the object! Not the beginning.
If bp
points at a D
object, see above memory diagram.
If bp
points at a B
object,
If we’re pointing at a B
object, bp->a
is 24 bytes below the pointer address.
Note: Now, finding superclass fields depends on the dynamic type.
Solution: Store the offset to the superclass fields inside of the vtable.
This is where the new “virtual” inheritance comes from.
Note: The D
object does not look like an A
object, a C
object, or a D
object simultaneously, but portions of it do.
Doing pointer assignment can change where the pointer is pointing to.
- This is really IMPORTANT, be careful
static_cast
/dynamic_cast
will also adjust pointer locations for you as necessary.reinterpret_cast
won’t - underscores danger associated with the cast.
Finally, both Google C++ Style guide, and ISOCpp both recommend when using multiple inheritance, Interface Classes are best.
Interface classes are those that:
- Define all methods as pure virtual
- Contain no state (fields)
Lecture 20
Dependency Inversion Principle: High level modules should not depend on low-level modules - instead, only should depend on an abstraction.
Abstract classes should not depend on concrete classes.
Example: Microsoft word. Word Counter rely on some sort of keyboard input. Breaks dependency inversion principle because counting words is “high level”, getting input “low-level”.
Another Example: Timer, lights/bell that goes off
Apply dependency inversion once => timer should no longer rely directly on Bell.
Apply dependency inversion again => Abstract Responder should not depend on the concrete timer.
This is simply the Observer Pattern - where Source
s are subjects, Responder
s are observers.
Could apply dependency inversion again → introduce another layer of abstraction between timer and bell / lights - but at some point, usefulness runs out. Too many layers of abstraction creates needless complexity.
What problem does Visitor Pattern solve?
“How do we write a method that depends on two polymorphic types?”
Example:
We might want different behaviours for each combination of these concrete classes.
We can override this in Rock
and Stick
, but we don’t know which enemy we’re hitting. Same problem exists in reverse if we implement it inside of Enemy
.
- Notice that dynamic casting can partially fix this issue, but not completely
Solution: Visitor Pattern - combine overloading and overriding.
beStruckBy
is a virtual function either callingMonster::beStruckBy
orTurtle::beStruckBy
(runtime)- Call
w.strike
on theWeapon&
. Strike is virtualRock::strike
orStick::strike
(runtime) - Are we calling
Rock::strike(Turtle&)
orRock::strike(Monster&)
? We know the type of this. If this is aTurtle*
, then use theTurtle
version. If this is aMonster*
, then use theMonster
version (compile time)
Another use of the visitor pattern - add behavior and functionality to classes, without changing the classes themselves.
Example: Compiler Design
I might want to collect info about my AST.
I could add a virtual method in ASTNode, override in concrete classes.
But: lots of changes are scattered over many classes. Difficult to understand, harder to implement incrementally.
Instead: provide nodes an accept method, taking in a polymorphic AstVisitor
.
Now, we can write different types of visitors to work on the parse tree. For example:
- String Visitor collect string literals in program, that must go in the DATA segment
- Scope Visitor match variables with associated scope
- CodeGeneration Visitor generates assembly for each given node
Solution improves cohesiveness all code for a particular visitor is located in one class. New visitor are easy to create, and can be made with 0 changes to the AST.
Lecture 21
I’m still not satisfied with this visitor pattern. Too much code!
There is going to be a lot of methods.
If we have subclasses of and subclasses of : different methods to write.
Annoying part: Boilerplate - write the following in each DerivedEnemy
:
- must be one for every single
DerivedEnemy
- Cannot put it in
Enemy
:
Doesn’t work - type of this
is wrong. It’s not telling us what the dynamic type is.
Solution to fixing the boilerplate code: CRTP (Curiously Recurring Template Pattern)
Template our superclass with a type parameter - inherit, AND substitute the derived class type.
How do we use this to fix the boilerplate code?
This sort of works
Cast this
from type Enemy<Turtle>*
to Turtle*
allows us to override into Rock::Strike(Turtle&)
and Stick::Strike(Turtle&)
.
Issue: Now, we have different superclasses for each Enemy
Because Enemy<Turtle>
and Enemy<Monster>
are different classes: can no longer use Enemy
s polymorphically - no vector<Enemy*>
allowed!
Solution: Add another layer of inheritance.
Now, we have a public interface by which all our concrete enemies follow: can all beStruckBy
weapons. We use the virtual method in Enemy
to resolve beStruckBy
to either EnemyBeStruck<Turtle>
or EnemyByStruck<Monster>
. Then just static_cast
to T*
- and we’re good.
Another problem CRTP can solve: polymorphic cloning.
Recall abstract books:
Say I have:
I want a deep copy of whatever b points to. I cannot just do this:
This attempts to create an AbstractBook
by invoking its constructor. Wrong for 2 reasons:
AbstractBook
is abstract, cannot instantiate those objects- Ignoring what we’re actually pointing at, we actually want to invoke a constructor that depends on the dynamic type of
b
We can provide a virtual clone
method for the purpose of solving this:
Instead use the copy ctor in each of our clone
methods to simplify the implementation.
Exact same code in MermaidBook
and Comic
- just the type of this
and the type of ctor which is changing. Once again, we can use CRTP.
b->clone
is virtual, so if b
points at a Comic
, we call BookClonable<Comic>::clone
- static_cast
this
into a Comic*
and invoke the Comic
copy constructor with the Comic&
.
Provided for all subclasses - reduces boilerplate code.
Command Pattern About using objects to encapsulate behavior of some “action” with a system.
Example: Consider writing an IDE, using MVC - we might have model/controller relationship where controller calls model methods, like:
Model::insert Text
Model::copy
Model::paste
Model::changeSyntaxHighlighting
This is a decent design - but it doesn’t allow for some features we might desire:
- Macros - replaying sequences of instructions
- Undo / redo
Command pattern: Instead of manipulating the model directly - pass commands to an invoker
Command
object has whatever information it needs to perform its given action - maybe model, maybe subjects within the model
- Controller would create command objects, supply with info they need
- Sends abstract commands to the
Invoker
for processing Invoker
calls each action method to execute the command
Invoker
can maintain additional state - stack of commands for undo / redo, or mapping of a macro keyword to a list of commands.
Lecture 22
Recall: We saw that reasoning about programs with exceptions is tricky.
We can be more specific about how safe our program is given an exception is thrown. 4 levels of exception safety:
- No exception safety - If an exception is thrown - no guarantees about program state - object has invalid memory, memory is leaked, program crashes.
- Basic guarantee - if an exception is thrown - program is in a valid, but unspecified state. Ex: Class invariants are maintained, no memory is leaked.
- Strong guarantee - If an exception is thrown, the program reverts back to its state before the method was called. Ex:
vector::emplace_back
- Either it succeeds, or if exception is thrown, vector is in its prior state
- Nothrow guarantee - exceptions are never propagated outside of the function call - always does its job. Ex:
vector::size
givesnothrow
guarantee, always returns size of vector without fail.
What is the exception safety of f
?
A::g
and B::h
both provide the strong guarantee.
f
- only satisfies the basic guarantee.
If a.g()
throws, then because it provides strong guarantee - it’s as if it was never called
- when exception propagates from
f
,f
has not done anything
If b.h()
throws - it provides the strong guarantee, so it does undo any of its effects. BUT: it has no knowledge that a.g()
ran just prior. As a result, when exception propagates from f
, the effects of a.g()
remain in place.
Program is in a valid yet unspecified state basic guarantee
Right now, there may be effects of a.g()
that are impossible to undo writing to a file, printing to the screen, sending a network request.
For simplicity, we’ll assume A::g
and B::h
have only local side effects (i.e. only the a
and b
objects and any associated memory may change).
Now, how may we try to rewrite f
to achieve the strong guarantee?
Idea: Use Copy-and-Swap Idiom, only work on temporary objects rather than real ones.
If aTemp.g()
or bTemp.h()
throw - no changes are made to the C
object, it’s as if f
was never called.
This is unfortunately, still the basic guarantee. Because, if b = bTemp
throws, we propagate an exception having modified a
.
What we really need for this is a non-throwing swap or assignment. Assigning a pointer will never throw. One possible solution: PImpl Idiom, “pointer to implementation”
This is the strong guarantee - if a.g()
or b.h()
throw - all work is done on temp
, so we’re fine.
Otherwise, swap will always succeed f
will do its job.
PImpl Idiom can also be used in some scenarios for reducing compilation dependencies.
typically:
C.h
On the other hand:
CImpl.h
C.h
Now, if cImpl
changes only C.cc
must recompile. Because we only use a forward declaration in C.h
no recompilation.
Finally - we could make CImpl
a polymorphic class - swap out implementations at runtime - the Bridge Pattern.
Back to exception safety - vector::emplace_back
. Gives us the strong guarantee. How?
Easy case: No resize, just put the object in the array. When resizing:
- allocate a new, larger array
- Invoke array copy constructor to copy objects of type T to new array
- If copy constructor throws: delete the new array, old array is left intact.
- Then delete old array, and set array ptr to new array (nothrow).
Complaint: This is a lot of copies when all I really want was to resize my array.
Better, would be to move:
- Allocate a new array
- Move objects from the old array to the new array
- If a move throws, move all our objects back from new array to old array, delete new array
- Delete old array, set pointer to new array
Problem: If move throws once, it might also when moving objects back. Once we’ve modified our old array, no guarantee we can restore it.
Solution: If move constructor is declared as “noexcept”, then emplace_back
will perform moves. Otherwise, it will copy over every item, and do this try-catch block to delete old items if necessary.
If possible, moves and swaps should provide the nothrow guarantee - and you should make this explicit to the compiler via the noexcept tag.
If you know a function will never propagate an exception - declare it noexcept to facilitate optimization.
Moves and swaps at the minimum should be noexcept.
Lecture 23
Most CPUs in the modern day have more than one core.
This can sometimes allow extraction of extra efficiency.
Example: Matrix Multiplication
Typically, compute entries in order:
In a multithreaded program: Create separate threads to perform the computations simulataneously.
This allows us to multiply each row for our product .
Why do we need ref/cref, the thread ctor implictly copies / moves its parameters.
cref, ref: create reference-wrapper object types, which can be copied / moved.
- distributed the workload. Each thread works independently on a separate part of the problem.
This is an example of an “embarrassingly parallelizable” problem.
What about synchronization? First: Compute Then compute gives us
We need to compute in entirety, before we may compute
Simply:
- Spawn threads for , join them.
- Spawn new threads for , join them
But: creating threads is expensive. What if we reuse the same threads?
Let’s do a barrier example: 2 functions - f1
and f2
: both must complete phase 1, then they can proceed to phase 2.
Why are the outputs different?
You should run this for yourself, sometimes the output will be differing. This is because the
<<
is actually a function call. Since the threads run simultaneously, you actually don’t have the guarantee that an entire line runs alone and prints together (because there are actually two function calls in that single line).
is there a solution? Yes, use \n
instead of endl
.
Use std::atomic<bool>
to prevent incorrect compiler optimizations with multithreaded code.
Conclusion, what was CS247 about?
This is not actually a course about C++. Purpose?
High-level Thinking:
- How well do my programs accommodate changes?
- How do I balance simplicity vs. abstraction?
- Where is abstraction useful, and where is it cumbersome?
Low-Level Thinking:
- What is the compiler doing?
- What are the efficiency tradeoffs in abstractions.
- Do I have a clear understanding of what my language compiles.
Learning more programming paradigms in detail will make you a better programmer.