Data-Oriented Design

Contrast to Object-Oriented Design.

Links

Talks

Notes from talk

Notes from the CppCon talk: https://www.youtube.com/watch?v=rX0ItVEVjHc&ab_channel=CppCon

The purpose of all programs is to transform data from one form to another.

Principles

  • If you don’t understand the data, you don’t understand the problem.
  • Conversely, understand the problem by understanding the data.
  • Different problems require different solutions.
  • If you have different data, you have a different problem.
  • If you don’t understand the cost of solving the problem, you don’t understand the problem.
  • If you don’t understand the hardware, you can’t reason about the cost of solving the problem.
  • Everything is a data problem. Including usability, maintenance, debug-ability, etc. Everything.
  • Solving problems you probably don’t have creates more problems you definitely do.
  • Latency and throughput are only the same in sequential systems.
  • Rule of thumb: Where there is one, there are many. Try looking on the time axis.
  • Rule of thumb: The more context you have, the better you can make the solution. Don’t throw away data you need.
  • Rule of thumb: NUMA extends to I/O and pre-built data all the way back through time to original source creation.
  • Software does not run in a magic fairy aether powered by the fevered dreams of CS PhDs.
    • software solves real-world problem, runs on real-hardware
  • Reason must prevail

Data-oriented design is not new. It’s more of a reminder of first principles.

C++ has 3 big lies:

  1. Software is a platform
  2. Code should be designed around our model of the world
  3. Code is more important than data

Wait what??

These are lies??

Lie #1: Software is a platform

FALSE. Hardware is the platform.

Text + data → CPU → Data

Different hardware require different solutions

  • But we abstract away the hardware…?

Reality is not a hack you’re forced to deal with to solve your abstract, theoretical problem. Reality is the actual problem.

Lie #2: Code should be designed around our model of the world

Hiding data is implicit in world modeling. It confuses 2 problems:

  1. Maintenance (allow changes to access)
  2. Understanding properties of data (critical for solving problems)

World modeling implies some relationship to real data or transforms.

The example he gives is about chairs.

World modeling tries to idealize the problem.

World modeling is the equivalence of self-help books for programming.

  • Solve by analogy…
  • Solve by storytelling…

Lie #3: Code is more important than Data

The vast majority of the time, the code is not the problem. The data is the problem.

For a very long time, I’ve embraced OOP, I really enjoy modeling the world in my game.

Programmer’s job is NOT to write code. Programmer’s job is to solve (data transformation) problems.

There is no ideal, abstract solution to the problem.

You can’t future proof.

Problems that these cause:

  • Poor performance
  • Poor concurrency
  • Poor optimizability
  • Poor stability
  • Poor testability

You design based on the mental model of the programmer.

Solve for the most common case first, not the most generic.

Compiler is a tool, not a magic wand. It can solve only ~10% of the problem.

Future proofing is BS.

Thoughts

Mike Acton and Mr. 1:16 are BOTH correct.

For many, dev time is worth more than CPU time. Many current software tools (i.e. anything “Java”) caters for their needs.

For others, outright performance or performance per watt is worth way more than their own time. E.g. game engines, or anything in a datacenter. If you save 10% on something that’s run on thousands of servers, that’s a win in terms of power costs.

The problem is accepting that the other guy’s situation is just as valid.