Preface
This post is based on my Bachelor’s thesis at the University of Huddersfield. It was written in May 2014 when I was 22 years old. Senior game engine and C++ developers might find the content trivial, but I’m sure there are many programmers still unaware of Data-Oriented Design, so hopefully this will be a nice introduction. The original title for my thesis was “A C++ Data-Oriented Component-Based Approach to Game Engine Development”. It will be in 3 parts – the introduction (part 1), the implementation (part 2), and the analysis (part 3) since it’s too big for one post. Let me know if you have any suggestions, or find any issues.
You can find the code for my thesis on GitHub.
1 Introduction
1.1 Background
Game engines are complex, therefore it’s important that they employ a well designed architecture that takes into account the trade-offs between readability, maintainability, and performance. Many programming paradigms exist in an attempt to improve upon each of these aspects, depending on the requirements of the software. Two opposing paradigms include Object-Oriented Programming (OOP) and Data-Oriented Design (DOD).
Object-Oriented programming resembles objects and their interactions. Each object holds both its own data and behaviour. These self-contained objects can be reused, maintained without side effects, and used without worrying about their internal workings [1]. This makes code easy to understand from the perspective of a human. However, this convenience comes at the cost of performance [2].
Computers like to work with homogeneous blocks of sequential data. If the data is haphazardly scattered, memory access is considerably slower [3]. This is especially important for games consoles where memory is limited [1].
Data-Oriented Design separates data from behaviour, and works from the perspective of a computer. It features better cache utilisation by ensuring that data is sequentially organised in memory. A CPU cache is a small amount of very fast memory, and DOD ensures that it can operate optimally. According to Jan Gray from Microsoft: “If you are passionate about the speed of your code, it is imperative that you consider the cache/memory hierarchy as you design and implement your algorithms and data structures.” [4]. It also leads to simpler code, and offers better parallelism [1].
It could be argued that the computing power of today is fast enough to not have to worry about the performance differences between object-oriented and data-oriented programming. However, there can in fact be significant differences. A game engine that does not optimise its cache coherence at a low level will experience performance issues as the software grows in complexity. This is a known issue with the Ogre 3D 1.x engine, which is largely Object-Oriented. Its performance flaws were explained by Goldberg [5] and Insomniac Game’s Mike Acton [6]. Ogre 2.x attempts to solve it by converting key areas to Data-Oriented Design.
Parallel processing is set to become increasingly significant in the near future, and games will need to take full advantage of multi-core processing to stay at the cutting edge. Virtual Reality devices will lead to a new era of graphically intensive gaming. Not only that, but the rise of mobile gaming means that power consumption is now a key issue; optimisation will ensure extended battery life and the capacity to push graphical complexity in order to remain competitive in a challenging market.
This paper will focus on the the development of a game engine based around the Data-Oriented Design paradigm and attempt to prove its benefits over Object-Oriented Programming.
1.2 Objectives
The purpose of this investigation is to:
- Discuss the benefits and drawbacks of Data-Oriented Design.
- Create a high performance extensible data-oriented framework for developers to either use in their native game engines, or learn from.
- Evaluate the performance and usefulness of the engine.
1.3 Hypothesis
Using data-oriented design to architect a game engine will lead to an increase in performance, reduced complexity, and offer better parallelism as opposed to using object- oriented programming.
2 Literature Review
2.1 Previous Research
2.1.1 Data-Oriented Design VS. Object-Oriented Programming
The concept of Data-Oriented Design has been around for a long time. For example, Sharp’s Data Oriented Program Design paper [7] was published back in 1980 and draws upon the works of John von Neumann who gave lectures on the subject in 1946 [8]. It describes the desire for increasing the efficiency of the processor and memory utilisation, along with increased concurrency. The principles still apply today. However, with the introduction and extensive usage of Object-Oriented Programming from the 1980s, Data-Oriented Design moved away from the spotlight. The development community was reminded of the paradigm in 2009 in the form of various blog posts on AltDevBlogADay by Noel Llopis, and subsequent presentation slides by industry veterans. Fabian’s online book “Data-Oriented Design” is the first complete and extensive documentation.
Below I list the benefits and drawbacks of both paradigms.
Data-Oriented Design
The advantages are according to Llopis [2] and Collin [3].
Advantages | Disadvantages |
---|---|
Simple and modular – Small functions, few dependencies, easy to understand and update. | Difficult concept to grasp – a different way of thinking. |
Better cache utilisation and performance – Stores homogeneous blocks of sequential data, CPU doesn't stall. | Hard to integrate with OOP. |
Better parallelism – Easy to move data and transforms between multiple threads or cores. |
Object-Oriented Programming
The advantages of OOP are according to Pokkunuri [9], and the disadvantages are according to Albrecht [1].
Advantages | Disadvantages |
---|---|
Modular – Each object is independent. | Worse performance – Harmful to the cache. Data is not contiguous. |
Black-box – The internal workings of an object can be redefined without modifying the system as a whole. | Complex code – Large emphasis on design. Promotes over-engineering. |
Conceptual – Related to real-world objects, thus making it easier to design and understand. | Difficult to thread. |
Re-usable – Inheritance promotes re-usability. |
As mentioned earlier, the choice is down to the requirements of the software and the trade-offs between design and performance. For game engines the choice is clear – Data-Oriented Design should be used in performance-critical areas, such as the scene graph or culling algorithms. It’s perfectly reasonable to use OOP for higher-level functionality, but designing the whole system around DOD will lead to far simpler code, as outlined in this paper.
2.1.2 Cache Coherence
“If you saw that there weren’t any apples in stock, would you still haggle over their price?” – Fabian [17]
This analogy applies to branching caused by the use of if statements. For example:
void SomeClass::someUpdateFunction() { if (mValue) { // Do something } else { // Do something } } |
The check to see if mValue is true occurs regardless of the value, and it effectively brings the program to a halt (for a very brief moment).
This is similar to the problems with Ford’s assembly line [10] before the adoption of Scientific Management (1908-1915) as described by Aitken & Hugh [11].
The three conditions are as follows:
- Gameplay conditions – e.g. checking when the jump button is pressed
- Structural conditions – e.g. checking if pointers are null
- Polymorphic calls – e.g. virtual functions used in OOP
Gameplay conditions are, for the most part, difficult and unnecessary to optimise.
When we make a trip to the supermarket, we usually make a list of items that we need upfront, which is more efficient than recalling what we are missing. Fabian recommends keeping data as arrays to remove the need for flow control, assuming the data is never null. This will keep structural conditions to a minimum.
Polymorphic calls are avoided entirely by adhering to the Data-Oriented paradigm.
CPUs most efficiently handle sets of contiguous data and constant operations. The less the CPU has to predict program flow the better. This not only reduces cache misses, but improves parallelism.
2.1.3 Engine Architecture
There has been significant research on the topic of game engine design – the possibilities of software architecture are vast. However, most existing research focuses on establishing functionality at a high-level without focusing on raw performance – usually adopting Object-Oriented Programming. As a justification, Stiegler [12] stated: “Highly optimized code is hard to change or refactor”. Which is true; it’s best to use Data-Oriented Design to solve problems which are unlikely to change, such as frustum culling. But if the paradigm is adopted engine-wide, then this argument is less relevant.
Ogre 3D is a good example of an existing Object-Oriented architecture:
The issue with this architecture is that everything is a singleton, and the heavy use of abstraction, deep branches of inheritance and virtual functions causes performance bottlenecks. Observing the call stack of Ogre at runtime shows just how many hoops the processor has to jump through to render a frame. The largest advantage of this architecture is its modularity – for example, it’s easy to add new RenderSystems. It also makes it simple to use.
One of my favourite papers on effective game engine design is by Spohr [13] which discusses an implementation of a parallel crowd renderer. It uses Intel’s TBB to split render operations into jobs with parallel_for.
Jason Gregory’s book Game Engine Architecture [14] is well known and covers a range of topics about game engine development. The 2nd Edition is due soon which covers aspects of multicore programming, CPU architecture, optimisation, and memory alignment.
2.1.4 Entity Systems
As with game engine architecture, most research on entity systems take an Object-Oriented route. As mentioned in Existing Products, Artemis is the first publicly available framework to adopt Data-Oriented Design, and there have been many subsequent entity systems based on this. A recent paper by Porter [15] states: “I believe that the approach employed by the Artemis framework is very close to an optimal solution”.
In order to compare with the architecture discussed later, below is an example of an Object-Oriented approach to entity systems (simplified version of the TidalEngine object system):
Here every component stores both its own data and functionality. Objects store their own set of components, and an ObjectManager holds the objects. The problem with this is that Objects and Components are each stored in large noncontiguous blocks of memory.
The update loop looks something like the following:
void ObjectManager::update() { for (Objects::iterator i = objects.begin(); i != objects.end(); ++i) { i->second->update(); } } void Object::update() { for (Components::iterator i = components.begin(); i != components.end(); ++i) { i->second->update(); } } |
Besides the cache misses caused every frame, Collin [3] outlines why this is difficult to thread:
2.2 Existing Products
2.2.1 Artemis (Java, C++)
The Artemis framework is the best available example of a data-oriented entity system, and it inspires the architecture used in this investigation. Entities are pure identifiers, components are pure data, and systems act on the data. While the original implementation was in Java, there have recently been several ports to C++, such as Artemis-C++, Coment, EntityX and Anax. However, there doesn’t appear to be any examples of these being used in a publicly available game engine.
2.1.2 Ogre 2.0 (C++)
Ogre 3D was originally designed as an “Object-Oriented Graphics Rendering Engine”. Ogre 2.0 is an attempt to refactor key areas of the engine in a data-oriented manner [5].
2.1.3 Autodesk Stingray
Stingray (formerly Bitsquid) is a game engine based on Data-Oriented Design [16].
2.1.4 Unity 3D
Unity is well known for it’s component-based object system; the entire engine is driven by objects and components.
Conclusion of Part 1
This post introduced data-oriented design, its differences with object-oriented programming, and discussed existing research and applications. The next post will focus on the design and implementation of a data-oriented game engine. Subscribe to the newsletter below to be notified.
- Albrecht T. Pitfalls of Object Oriented Programming. Sony. http://harmful.cat-v.org/software/OO_programming/_pdf/Pitfalls_of_Object_Oriented_Programming_GCAP_09.pdf. Published 2009.
- Llopis N. Data-Oriented Design (Or Why You Might Be Shooting Yourself in The Foot With OOP). Games from Within. http://gamesfromwithin.com/data-oriented-design. Published September 2009.
- Collin D. Introduction to Data-Oriented Design. Dice. http://www.slideshare.net/DICEStudio/introduction-to-data-oriented-design. Published November 5, 2010.
- Meyers S. CPU Caches and Why You Care. Scott Meyers. http://www.aristeia.com/TalkNotes/ACCU2011_CPUCaches.pdf. Published May 21, 2010.
- Goldberg M. Ogre Pitfalls & Design Proposal for Ogre 2.0. Ogre Forums. http://www.ogre3d.org/forums/viewtopic.php?f=4&t=75459. Published November 18, 2012.
- Acton M. Mike Acton’s review of Ogre 3D’s OgreNode.cpp. Bounce. http://www.bounceapp.com/116414. Published November 9, 2013.
- Sharp JA. Data oriented program design. ACM SIGPLAN Notices. 1980;15(9):44-57. doi:10.1145/947706.947713.
- von NEUMANN J. The Principles of Large-Scale Computing Machines. IEEE Annals Hist Comput. 1988;10(4):243-256. doi:10.1109/mahc.1988.10045.
- Pokkunuri BP. Object Oriented Programming. ACM SIGPLAN Notices. 1989;24(11):96-101. doi:10.1145/71605.71612.
- Staff H co. Ford’s assembly line starts rolling. History.com. http://www.history.com/this-day-in-history/fords-assembly-line-starts-rolling. Published 1913.
- Aitken HGJ. Scientific Management in Action: Taylorism at Watertown Arsenal, 1908-1915. Princeton University; 1985.
- Stiegler A. Architecture and Prototype of a Game Engine. November 2010.
- Spohr J. Parallel Crowd Rendering. March 2008.
- Gregory J. Game Engine Architecture. A K Peters/CRC Press; 2009.
- Porter N. Component-based game object system. April 2012.
- Frykholm N. Practical Examples in Data Oriented Design. Bitsquid. http://bitsquid.blogspot.co.uk/2010/05/practical-examples-in-data-oriented.html. Published May 28, 2010.
- Fabian R. Data-Oriented Design. DataOrientedDesign.com. http://www.dataorienteddesign.com/dodmain/dodmain.html. Published June 25, 2013.
This post is amazing! Do you still plan to do a Part-2 follow up? Please, do it! Also, I would love to read your thesis.
very insightful, can’t wait for parts 2 and 3!
Where is part 2?
Apologies guys, part 2 is still on my todo list, but I’ve been too busy with other ventures. I will let you know by email when it’s out.
If anyone else would like to be notified, feel free to add a comment.
Been 4 years, still no part 2 :/
Sorry all. Unfortunately unless I am able to retire I may never get round to finishing it. I would if I could. The code is available, however.
£10k of coffees should do it https://ko-fi.com/danielsefton just want to offer an option for anyone who really needs this as opposed to me never being able to do it.
Still no part 2.
Is it me or does functional programming combine these two worlds benefits?
Especially with DDD combined does it occur to me that the modularity and the conceptional context of OO becomes in functional-declarative programming united with the parallelism friendliness and conciseness of data oriented design.
Are you concerned about the performance?
Hi any follow up on this as i am currently writing my thesis on a similar / the same subject so any more insight would be greatly appreciated.
many thanks for your work so far.
Is your thesis available somewhere? I’d really like to read the rest of it.
Been 4 years, still no part 2 :/