Lecture19 | Games Engineering

# Lecture 19 - Performance Optimisation
### SET09121 - Games Engineering

<br /><br />
Babis Koniaris/Tobias Grubenmann
<br />

School of Computing. Edinburgh Napier University

---

# What is Performance Optimisation?

- Optimisation is about making the best use of a resource.
- Optimisation in software is about making best use of our computer hardware resource(s).
- There are different areas we can optimise for in software, but we will focus on performance.
- Performance is about getting the most work done in the shortest amount of time with our computing resource.
- Therefore, in a game, we are worried about:
    -  producing a frame in a reasonable time (typically 16.6ms) 
    -  performing the most work possible in that time to give a good gameplay experience.
- We are going to look at code level concerns mainly. Turning down update frequencies of systems is another strategy.

---

# Premature Optimisation

Two famous quotes by Donald Knuth:
- "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%."
- "In established engineering disciplines a 12% improvement, easily obtained, is never considered marginal and I believe the same viewpoint should prevail in software engineering."

---

# Premature Optimisation

Basically, Knuth argues that we should not let performance considerations determine the design of our code -  it makes the code more difficult to work with.

I think a good rule for the module is -  get your game working first; then worry about extra features and performance optimisation.

A good approach is to design-build-measure-optimise.

---

# The 80/20 Rule

- You might have heard of this...
- Pareto Principle (or 80/20 rule) states that 80% of output comes from 20% of input.
- Applied to programming, we can say that 80% of processor time will happen in 20% of our code.
- It does make sense -  loops normally are the biggest area of computation in your application.

![image](assets/images/80-20.jpg)

---

# What are we interested in?

- There are two areas we can focus on to improve program performance for our games.
- **CPU utilisation**:
 - How well are we using the processor? Is it doing work it doesn't need to?
- **Memory usage**
 - Is memory effectively accessible to the processor? Is the processor waiting too long to do memory operations?
- We will focus on these two areas, looking at best-practice on the CPU and memory usage.
- There are many more techniques and tricks we can use, but normally they come down to these same two areas.

---

# First big trick

Release mode and run without debug

- A debug build is far slower than a release build
- Running with "Debugging" mode on in a build is far costlier than without debugging
- To identify the true performance: build with Release, execute without debugging

![image](assets/images/run-no-debug.JPG)

---

# Second big trick

Avoid I/O or do it better

- During debugging, we often output values to the console to check behaviour.
- I/O like this is very slow, requiring your program to interact with the OS and present data.
- You should avoid this I/O as far as possible in final builds.
- When using `cout`, avoid the end-of-line terminator (`endl`), as this also flushes a stream, which is slow.
- `cout` might be slower than `printf` by default, but that's fixable with `std::ios::sync_with_stdio(false);`
- Easy debug-only code execution: `#ifdef _DEBUG`

---

# Metrics

- Let's define metrics that allow us to talk about performance .
- FPS: Frames-Per-Second. 
    - The key measure most gamers like to talk about. The typical FPS displayed is the **average** of the number of frames processed per second. 
- Frame Time:  
    - This is actually what we are interested in. How long does it take the game to produce and render a **single** frame? Typically we aim for 16.7ms (60FPS) or 33.3ms (30FPS).
- Speedup
    -  When we make an improvement we need to understand what that improvement is. Speedup is the calculation of the original time against the new time. It is calculated as $S=\frac{original}{new}$.

---

## Step 1 - Only process what you need

---

# Alive Flag

- The first tactic we can use to improve processing is to flag if processing something can be skipped.
- An alive flag is a typical technique to indicate that an object should not be processed.

```cpp
if (alive) {
    DoSuperExpensiveOperation();
}
...
if (health == 0) {
    alive = false;
}
```

---

# Object Pool

- Object creation and destruction is very expensive.
- It involves memory allocation, function calls, grabbing bits and pieces, maybe loading content.
- It can also lead to objects being scattered around memory -  expensive to jump around.
- An object pool fixes that (especially when combined with alive flags):
    - Allocate max number of objects required.
    - When a new object is needed grab from allocated pool and set necessary values.
    - When finished, flag as not-alive and give back to pool.

---

# Dirty Flag

- Some game data is processed each frame to allow our game to have a dynamic nature.
- However, a lot of data only changes in some circumstances.
    - For example, the player only moves when the user controls them.
- Rather than reprocess certain data every frame, we can use the dirty flag to say that data should be reprocessed that frame.

```cpp
if (player moved) {
    Change position in primary data
    Set dirty flag on primary data
}
...
if (dirty flag is true) {
    Process secondary data (expensive)
    Set dirty flag to false
}
```

---

## Step 2 - Only draw what is visible

---

# Visible Flag

- Rendering to the screen is one of the most expensive processes in games.
    - It's why we have dedicated graphics hardware.
- We can use our flag technique to determine if an object is visible and therefore should be rendered.
- This allows us to hide objects/turn off their rendering when we want.
- It also allows us to add objects that should not be rendered.
    - Remember - what you see when playing a game isn't all that is there.

```cpp
    if (visible)
    {
        Render object (expensive)
    }
```

---

# Spatial Partitioning

- Another question is whether an object is even on screen.
- Spatial partitioning allows us to divide the world up so we only render the parts that are visible.
- Also used for collision detection optimisation.

![image](assets/images/spatial-partition.png)

---

# Example - Horizon Zero Dawn

---

## Step 3 - Think about your memory

---

# Memory

Allocate Your Required Memory First
- We have mentioned this a few times now.
- Memory allocation (and subsequent deallocation) is expensive on the free store.
- Try and allocate everything you need at the start of a level or the game. Then it is there and you can access it uniformly.
- Data should also be near similar data -  this allows quick processing of blocks during similar operations.

---

# `constexpr` What You Can

- `const` is a qualifier used for readability, maintenance and performance
- `constexpr` takes this further: expression is calculated at compile time
    - So you can produce certain functions that are compile time processed.
- Compile time means the code is not processed during runtime.

```cpp
constexpr int N = 1000;

constexpr int factorial(int n)
{
    return n <= 1 ? 1 : (n * factorial(n - 1));
}

//compiler does this!
constexpr int Nfav = factorial(N);

```

---

# Memory Alignment and Cache Coherence
- We talked about this during our memory and resource management lectures.
- Memory alignment means that data is aligned in memory to minimize the reads to access the data that we need.
- For cache coherency we discussed the difference in processing a multi-dimensional array using different indices, due to memory layout. For example, the first `for` loop below is faster than the second.

```cpp
for (int i=0; i < 32; i++)
    for (int j=0; j < 32; j++)
        total += myArray[i][j]; // GOOD! Fast!

for (int i=0; i < 32; i++)
    for (int j=0; j < 32; j++)
        total += myArray[j][i]; // BAD! Slow!
```

---

## Step 4 - Use tools to find slow bits

---

# Finding Hot Paths -  Using Tools

Tools do a good job of finding code that is slowing things down.

![image](assets/images/hot-path.png)

---

# Bottlenecks

- The key aim with tools is bottleneck identification.
- Once you find a bit of your code that is impacting performance, you need to identify what, if anything, can be done about it.
- Often, these bottlenecks are loops that are processing lots of data.
- Even a small tweak here can make all the difference.

![image](assets/images/bottleneck.jpg)

---

# Algorithmic Analysis

- And this is where algorithmic analysis can come in.
- Abstractly measuring your algorithms, finding more efficient algorithms, and optimising the algorithms you have is important.
- See your Algorithms and Data Structures material for more insight.

![image](assets/images/alg-analysis.jpg)

---

## Step 5 - Optimise function calls

---

# Function Calls Cost

- Function calls have a cost associated with them.
- Two things have to happen.
    1.  Set up the parameters on the stack -  copy data.
    2.  Jump to the new code position.
- On return there is a jump back again.

![image](assets/images/function-call.png)

---

# `static` Local Functions

- A `static` function is one that exists within a certain context or
    scope (e.g. class scope).

- If a function is `static` in a C++ code file, the compiler knows it
    can try and optimise it without affecting external code.

- Effectively, rearranging and possible inlining can occur, speeding
    up the program.

```cpp
    static int add(int x, int y)
    {
        return x + y;
    }
```

---

# `virtual` Function Calls

- `virtual` functions have an additional cost.
- A `virtual` function call involves a lookup on the object to determine which function to call.
- Effectively we are double jumping in this instance.

![image](assets/images/virtual-function.png)

---

#  `const` What You Can

- Basically set everything you can to `const`.
- A `const` method is one that will not change the object.
- Therefore the compiler can optimise the code based on access again.

```cpp
    class my_class
    {
    public:
        void do_work() const
        {
            // Do something
        }
    };
```

---

## Step 6 - Branching and Loops

---

#  Branching

- A branch (an `if` statement of loop) has a cost to check and a cost to jump.
- If possible, use a switch statement instead of if/else if/else if/...

```cpp
    if (value == sth) { /* Do work */ }
	else if (value == sth_else) { /* Do other work */ }
	...
	else { /*fallback*/}
	// OR
	switch(value)
	{
		case sth: /*do work*/ 
			break;
		case sth_else: /*do other work*/ 
			break;
		default:
			break;
	}
```

---

# `for` Loops

- For loops are one of the most expensive parts of your application due to the number of iterations.
- They are also one of the best places to optimise -  we will look at parallelisation here also.
- One particular point is avoiding doing work that the loop statement can do -  such as the indexer.

```cpp
    // Multiply every iteration
    for (int i = 0; i < 10; ++i)
        cout << i * 10 << endl;

// Add every iteration
    for (int i = 0; i < 100; i += 10)
        cout << i << endl;
```

---

## Step 7 - Use more cores!!!

---

# Just Throw Some Threads at the Problem!?

- A simple solution may be to use more of your hardware resources.
- Multi-core means you can execute code in parallel in different cores at the same time
- There are different techniques: OpenMP, parallel STL algorithms (C++17), async, threads, etc
    - More on SET10108: Concurrent and Parallel Systems

---

# Cost of Threads

- Threads do have costs: performance, cognitive and maintenance
- They require memory, and switching between threads costs time
- They can easily introduce bugs into your application
- Keeping track of application workflow with threads is harder

---

## Summary

---

# Summary

- Performance optimisation is important, but you need to be careful. 
- Premature optimisation is the root of all evil, but think of your algorithm choices.
- Use tools to identify bottlenecks. Fix if needed.
- Most impactful optimisation is not running code at all (dirty/alive flags, etc).
- Low-level optimisations are typically an illusion that makes your code less readable.
- High-level optimisations can have the greatest effect, and they happen "on paper".
- Parallelisation is great, and is also a can of worms. Tread carefully.