I was playing around with a simple "game" to test different aspects of Data Oriented Design when I stumble across this strange performance drop.
I have this struct to store data of the game ships:
constexpr int MAX_ENEMY_SHIPS = 4000000;
struct Ships
{
int32_t count;
v2 pos[MAX_ENEMY_SHIPS];
ShipMovement movements[MAX_ENEMY_SHIPS];
ShipDrawing drawings[MAX_ENEMY_SHIPS];
//ShipOtherData other[MAX_ENEMY_SHIPS];
void Add(Ship ship)
{
pos[count] = ship.pos;
movements[count] = { ship.dir, ship.speed };
drawings[count] = { ship.size, ship.color };
//other[count] = { ship.a, ship.b, ship.c, ship.d };
count++;
}
};
Then I have a function to update the movement data:
void MoveShips(v2* positions, ShipMovement* movements, int count)
{
ScopeBenchmark bench("Move Ships");
for(int i = 0; i < count; ++i)
{
positions[i] = positions[i] + (movements[i].dir * movements[i].speed);
}
}
My understanding is that, as the MoveShips function only uses the position and movement arrays, extra memory in the Ships struct will not affect its performance. However, when I uncomment the commented lines on the Ships struct, the performance drops a lot. With the current value of MAX_ENEMY_SHIPS, the duration of the MoveShips function in my computer goes from 10-11 ms to 200-210 ms.
Here I let a minimum, reproducible example:
#include <stdlib.h>
#include <stdio.h>
#include <chrono>
#include <string>
class ScopeBenchmark
{
public:
ScopeBenchmark(std::string text)
: text(text)
{
begin = std::chrono::steady_clock::now();
}
~ScopeBenchmark()
{
std::chrono::steady_clock::time_point end = std::chrono::steady_clock::now();
printf("%s: %lli\n", text.data(), std::chrono::duration_cast<std::chrono::milliseconds>(end - begin).count());
}
private:
std::string text;
std::chrono::steady_clock::time_point begin;
};
constexpr int32_t Color(uint8_t r, uint8_t g, uint8_t b)
{
return (r << 16) | (g << 8) | b;
}
struct v2
{
float x;
float y;
};
inline v2 operator+(v2 a, v2 b)
{
v2 result;
result.x = a.x + b.x;
result.y = a.y + b.y;
return result;
}
inline v2 operator*(v2 a, float b)
{
v2 result;
result.x = a.x * b;
result.y = a.y * b;
return result;
}
//----------------------------------------------------------------------
struct Ship
{
v2 pos;
v2 size;
v2 dir;
float speed;
int32_t color;
v2 a;
v2 b;
v2 c;
v2 d;
};
struct ShipMovement
{
v2 dir;
float speed;
};
struct ShipDrawing
{
v2 size;
int32_t color;
};
struct ShipOtherData
{
v2 a;
v2 b;
v2 c;
v2 d;
};
constexpr int MAX_ENEMY_SHIPS = 4000000;
struct Ships
{
int32_t count;
v2 pos[MAX_ENEMY_SHIPS];
ShipMovement movements[MAX_ENEMY_SHIPS];
ShipDrawing drawings[MAX_ENEMY_SHIPS];
//ShipOtherData other[MAX_ENEMY_SHIPS];
void Add(Ship ship)
{
pos[count] = ship.pos;
movements[count] = { ship.dir, ship.speed };
drawings[count] = { ship.size, ship.color };
//other[count] = { ship.a, ship.b, ship.c, ship.d };
count++;
}
};
void MoveShips(v2* positions, ShipMovement* movements, int count)
{
ScopeBenchmark bench("Move Ships");
for(int i = 0; i < count; ++i)
{
positions[i] = positions[i] + (movements[i].dir * movements[i].speed);
}
}
struct Game
{
int32_t playerShipIndex;
Ships ships;
};
void InitGame(void* gameMemory)
{
Game* game = (Game*)gameMemory;
Ship ship;
ship.pos = { 0.0f, 0.0f };
ship.size = { 100.0f, 100.0f };
ship.speed = 1.0f;
ship.color = Color(64, 192, 32);
game->ships.Add(ship);
game->playerShipIndex = 0;
ship.speed *= 0.5f;
ship.dir.x = -1.0f;
ship.size = { 50.0f, 50.0f };
ship.color = Color(192, 64, 32);
for(int i = 0; i < MAX_ENEMY_SHIPS; i++)
{
ship.pos = { 500.0f, 350.0f };
game->ships.Add(ship);
}
}
int main()
{
Game* game = (Game*)malloc(sizeof(Game));
memset(game, 0, sizeof(Game));
InitGame(game);
while (true)
{
MoveShips(game->ships.pos, game->ships.movements, game->ships.count);
}
}
I use the Visual Studio compiler, and compile the file with the following command:
cl.exe /O2 /GL src/Game.cpp
So, my question is: why the performance of the MoveShips function drops so hard when adding memory that is not used?
The problem is that you are passing uninitialized data in the function calls game->ships.Add(ship)
. This causes undefined behavior.
In the first function call, both ship.dir.x
and ship.dir.y
are uninitialized. In all further function calls, ship.dir.y
is uninitialized.
This can have an especially negative performance impact if ship.dir.y
happens to contain garbage data which represents a denormalized floating point value. See this question for further information.
I was able to reproduce your problem and my tests have shown that this is the reason for your performance degradation. By initializing the variable ship.dir.y
to a normalized floating point value, I was able to reliably get a performance increase by a factor of 45(!).
I don't think that your problem has anything to do with you increasing the size of your struct
by un-commenting two lines code. Although it has been suggested in the comments section that this could cause your program to slow down due to swap space usage, my tests show that this has no significant performance impact in your case. Increasing the total size of the memory allocation to 256 MB should generally not be a problem, unless you are on a computer with a very small amount of memory. Therefore, I believe your observation that the performance decreases significantly when you un-comment those two lines of code is just a coincidence.
My guess is that address space layout randomization (ASLR) is causing you to get different garbage values every time you run your program, so that they sometimes represent a denormalized floating point value and sometimes not. At least that is what I have experienced during my tests: With ASLR active, I sometimes get a denormalized value and sometimes a normalized one. However, with ALSR disabled (using the /DYNAMICBASE:NO
linker option in MS Visual Studio), I always get a denormalized value and never a normalized one.
If you are sure that your observations from un-commenting your code are no coincidence, but consistent, then the most likely explanation is that un-commenting the code is causing you to receive different garbage values, which happen to always represent a denormalized floating point value.
Therefore, in order to fix your problem, all you have to do is to make sure that ship.dir.x
and ship.dir.y
are properly initialized before you pass them to the function.
Also, although this is probably not the reason for your problem, it is important to point out that you are writing to all 4 arrays in struct Ships
out of bounds. You are calling the function game->ships.Add(ship)
exactly MAX_ENEMY_SHIPS + 1
times, once outside the loop and MAX_ENEMY_SHIPS
times inside the loop. Therefore, you are passing the boundary of each array by exactly one element. This also causes undefined behavior.