c++optimizationdesign-patternsdata-oriented-design

Big projects that require very good performance really don't use polymorphism?


For quite some time I have been interested in perfomance in C ++.

A lot of things keep coming up, whether in conferences or in books:

Do not use a virtual function, have the data in the cache, the branches etc.

There are many benchmarks with examples of video games to show the differences in performance. The thing is, the examples are always very simple.

How does that really work in code that is more than 20 lines? in AAA video games, finance etc.

If I have 100 types of objects that have different update, different behavior and other joys, it's easy to set up via polymorphism or function pointers.

Now, by following the advice given to make a powerful code, the above options are not possible. We will therefore prefer to have 100 arrays that we will update separately. We will have good access to the cache, the functions will probably be inline etc. in short, the performance will in principle be better.

So, I have 100 arrays and 100 differents functions that i will have to call at each frame. The tables dynamically change depending on what happens, new players appear, a monster dies etc. Some array will have 0 elements active, others 10... I would call functions that will have no work (array without active element) but I have no choice, I have to have a flag or look if elements are active or not in my array.

I end up with something like this:

obj1update ();
obje2update ();
....

obj1behavior ();
obj2behavior ();
....

obj1render ();
obj2render ();
.....

objectxy ();
....

Of course, there will undoubtedly be a function which manages all the update calls on the objects, one for the behavior etc but to simplify it gives as above.

To stay in the video game, if the x action of a player causes the arrival of y monsters, there are several types of monsters which have different functions. We will therefore put the monsters in our tables and this time the functions dealing with these monsters will have work.

We can use the ECS pattern or a derivative but they will potentially have very different behaviors (an ia that directs them or other), different components and therefore different functions will be needed to process them. They will be called hard in the code since we don't have polymorphism or function pointers and we will have to check at each frame if they have something to process or not.

Is it really done that way? suppose i have 500 types? 1000 ?

Edit:

Lots of comments so I'll get back to you here.

As Federico said, I want to know if these recommendations are good for books but less so in practice.

Several resources that I have looked at:

https://www.agner.org/optimize/#testp Great suite of several books

www.youtube.com/watch?v=WDIkqP4JbkE&t Scott Meyers talk on memory

https://people.freebsd.org/~lstewart/articles/cpumemory.pdf On memory

www.youtube.com/watch?v=rX0ItVEVjHc&t Data-oriented programming

https://www.dataorienteddesign.com/dodbook/ data oriented design book

There are also other resources but it already gives you an idea on what I'm basing


Solution

  • No, real programs are not written like that. Real programs are written by noticing that all the monsters have a bunch of things in common, and using the same code to do those things. They all pathfind, but they have different distances they can walk? Great. Add a max_walking_distance variable and call the same function each time.

    All your monsters have a 3D model? Then you don't need a virtual render method. You can just render the model.

    You don't have to divide up your data according to "sensible" boundaries. You don't have to have a struct monster. You can have a struct monster_pathfinding and a struct monster_position and a struct monster_3d_model. Even if you just put these in parallel arrays (i.e. monster 123 has its pathfinding info in monsters_pathfinding[123] and its position in monster_positions[123]) this can make more efficient use of the data cache, because the pathfinding code doesn't load the 3D model pointers into the cache. You can get cleverer by skipping entries if some monsters don't pathfind or don't render. Essentially it is recommended for performance that you group data together according to how it's used, not according to your mental model of the things in the game. Yes, skipping entries makes it way more difficult to delete monsters. But you tick monsters a lot, and you don't delete monsters very often, right?

    Maybe only a few monsters shoot guns at the player (the rest try to eat the player). You can have a struct monster_gun_data {int ammunition; int max_ammunition; int reload_time; monster_position *position;}; and then if you have 200 monsters, but only 10 of them have guns, your monstersShootGunsAtPlayers function only has to iterate over the 10 entries in the monster_gun_data array (and load their positions via pointers). Or, you might profile that and find out that because most monsters in your game have guns, it's slightly faster to iterate over all the monsters and check their MONSTER_HAS_GUN flag instead, than to access the position through a pointer which can't be prefetched as easily.

    How do you do different kinds of monster attacks? Well, if they're completely different (melee vs ranged), you probably do them with different functions as you have described. Or you might only check the attack type after you decide the monster wants to attack the player. You seem to suggest monsters use different attack code, but I bet this works for almost all of them:

    if(wantsToAttack(monster, player)) {
        if((monster->flags & HAS_RANGED_ATTACK) && distance(monster, player) > monster->melee_attack_distance)
            startRangedAttack(monster, player);
        else
            startMeleeAttack(monster, player);
    }
    

    And what's really different between a monster with a gun, and a monster with a bow and arrow? The attack speed, the animation, the speed the projectile moves at, the projectile's 3D model, and the amount of damage it does. That's all data. That isn't different code.

    Finally, if you have something completely different, you might consider making it a "strategy object" with a virtual function. Or just a plain function pointer, if you can. Note that the Monster object is still not polymorphic, because if it was, we couldn't have an array of them and that would slow down all the common code. Only the specific parts of the monster that we're saying are polymorphic are actually polymorphic.

    void SpecialBossTickFunction(Monster *monster) {
        // special movement, etc
    }
    // ...
    monster->onTick = &SpecialBossTickFunction;
    // monster is still not polymorphic except for this one field
    

    You could also do:

    struct SpecialBossTickStrategy : TickStrategy {
        void onTick(Monster *monster) override {...}
        // then you can also have extra fields if needed
        // but you also have more indirection
    };
    monster->onTick = new SpecialBossTickStrategy;
    

    And don't do stuff unnecessarily. Try to be event-driven instead of doing stuff every tick:

    // bad because we're calling this function unnecessarily every tick
    void SpecialUndeadMonsterTickFunction(Monster *monster) {
        if(monster->isDead) {
            // do some special reanimation sequence
        }
    }
    monster->onTick = &SpecialUndeadMonsterTickFunction;
    
    // better (for performance)
    void SpecialUndeadMonsterTickWhileDeadFunction(Monster *monster) {
        // do some special reanimation sequence
        if (finished doing whatever) {
            monster->onTick = NULL;
        }
    }
    void SpecialUndeadMonsterDeathFunction(Monster *monster) {
        monster->onTick = &SpecialUndeadMonsterTickWhileDeadFunction;
    }
    // ...
    monster->onDead = &SpecialUndeadMonsterDeathFunction;
    
    // Also better (for performance)
    void DoUndeadMonsterReanimationSequences() { // not virtual at all, called from the main loop
        for(Monster *monster : special_undead_monsters_which_are_currently_dead) {
            // do some special reanimation sequence
        }
    }
    
    // Not great, but perhaps still less bad than the first one!
    void DoUndeadMonsterReanimationSequences() { // not virtual at all, called from the main loop
        for(Monster &monster : all_monsters) {
            if(monster.type == TYPE_SPECIAL_UNDEAD_MONSTER && monster.isDead) {
                // do some special reanimation sequence
            }
        }
    }
    

    Note that in the third example you have to keep this array special_undead_monsters_which_are_currently_dead up to date. That's okay, because you only have to change it when a monster spawns, disappears, dies, or un-dies. And those are relatively rare events. You are doing a bit more work in these events, in order to save work every tick.


    Finally, keep in mind these are techniques that may or may not improve performance in your actual program. I see DOD as a grab-bag of ideas. It doesn't say you must write your program in exactly a certain way, but it is offering a bunch of unconventional suggestions, the theory to explain why they work, and examples of how other people have managed to use them in their programs. Since DOD usually suggests that you complete reorganize your data structures, you may only want to implement it in the performance-critical areas of your program.