I'm engaged in developing a turn-based casual MMORPG game server.
The low level engine(NOT written by us) which handle networking, multi-threading, timer, inter-server communication, main game loop etc, was written by C++. The high level game logic was written by Python.
My question is about the data model design in our game.
At first we simply try to load all data of a player into RAM and a shared data cache server when client login and schedule a timer periodically flush data into data cache server and data cache server will persist data into database.
But we found this approach has some problems
1) Some data needs to be saved or checked instantly, such as quest progress, level up, item & money gain etc.
2) According to game logic, sometimes we need to query some offline player's data.
3) Some global game world data needs to be shared between different game instances which may be running on a different host or a different process on the same host. This is the main reason we need a data cache server sits between game logic server and database.
4) Player needs freely switch between game instances.
Below is the difficulty we encountered in the past:
1) All data access operation should be asynchronized to avoid network I/O blocking the main game logic thread. We have to send message to database or cache server and then handle data reply message in callback function and continue proceed game logic. It quickly become painful to write some moderate complex game logic that needs to talk several times with db and the game logic is scattered in many callback functions makes it hard to understand and maintain.
2) The ad-hoc data cache server makes things more complex, we hard to maintain data consistence and effectively update/load/refresh data.
3) In-game data query is inefficient and cumbersome, game logic need to query many information such as inventory, item info, avatar state etc. Some transaction machanism is also needed, for example, if one step failed the entire operation should be rollback. We try to design a good data model system in RAM, building a lot of complex indexs to ease numerous information query, adding transaction support etc. Quickly I realized what we are building is a in-memory database system, we are reinventing the wheel...
Finally I turn to the stackless python, we removed the cache server. All data are saved in database. Game logic server directly query database. With stackless python's micro tasklet and channel, we can write game logic in a synchronized way. It is far more easy to write and understand and productivity greatly improved.
In fact, the underlying DB access is also asynchronized: One client tasklet issue request to another dedicate DB I/O worker thread and the tasklet is blocked on a channel, but the entire main game logic is not blocked, other client's tasklet will be scheduled and run freely. When DB data reply the blocked tasklet will be waken up and continue to run on the 'break point'(continuation?).
With above design, I have some questions:
1) The DB access will be more frequently than previous cached solution, does the DB can support high frequent query/update operation? Does some mature cache solution such as redis, memcached is needed in near future?
2) Are there any serious pitfalls in my design? Can you guys give me some better suggestions, especially on in-game data management pattern.
Any suggestion would be appreciated, thanks.
I've worked with one MMO engine that operated in a somewhat similar fashion. It was written in Java, however, not Python.
With regards to your first set of points:
1) async db access We actually went the other route, and avoided having a “main game logic thread.” All game logic tasks were spawned as new threads. The overhead of thread creation and destruction was completely lost in the noise floor compared to I/O. This also preserved the semantics of having each “task” as a reasonably straightforward method, instead of the maddening chain of callbacks that one otherwise ends up with (although there were still cases of this.) It also meant that all game code had to be concurrent, and we grew increasingly reliant upon immutable data objects with timestamps.
2) ad-hoc cache We employed a lot of WeakReference objects (I believe Python has a similar concept?), and also made use of a split between the data objects, e.g. “Player”, and the “loader” (actually database access methods) e.g. “PlayerSQLLoader;” the instances kept a pointer to their Loader, and the Loaders were called by a global “factory” class that would handle cache lookups versus network or SQL loads. Every “Setter” method in a data class would call the method changed
, which was an inherited boilerplate for myLoader.changed (this);
In order to handle loading objects from other active servers, we employed “proxy” objects that used the same data class (again, say, “Player,”) but the Loader class we associated was a network proxy that would (synchronously, but over gigabit local network) update the “master” copy of that object on another server; in turn, the “master” copy would call changed
itself.
Our SQL UPDATE
logic had a timer. If the backend database had received an UPDATE
of the object within the last ($n) seconds (we typically kept this around 5), it would instead add the object to a “dirty list.” A background timer task would periodically wake and attempt to flush any objects still on the “dirty list” to the database backend asynchronously.
Since the global factory maintained WeakReferences to all in-core objects, and would look for a single instantiated copy of a given game object on any live server, we would never attempt to instantiate a second copy of one game object backed by a single DB record, so the fact that the in-RAM state of the game might differ from the SQL image of it for up to 5 or 10 seconds at a time was inconsequential.
Our entire SQL system ran in RAM (yes, a lot of RAM) as a mirror to another server who tried valiantly to write to disc. (That poor machine burned out RAID drives on average of once every 3-4 months due to “old age.” RAID is good.)
Notably, the objects had to be flushed to database when being removed from cache, e.g. due to exceeding the cache RAM allowance.
3) in-memory database … I hadn't run across this precise situation. We did have “transaction-like” logic, but it all occurred on the level of Java getters/setters.
And, in regards to your latter points:
1) Yes, PostgreSQL and MySQL in particular deal well with this, particularly when you use a RAMdisk mirror of the database to attempt to minimize actual HDD wear and tear. In my experience, MMO's do tend to hammer the database more than is strictly necessary, however. Our “5 second rule”* was built specifically to avoid having to solve the problem “correctly.” Each of our setters would call changed
. In our usage pattern, we found that an object typically had either 1 field changed, and then no activity for some time, or else had a “storm” of updates happen, where many fields changed in a row. Building proper transactions or so (e.g. informing the object that it was about to accept many writes, and should wait for a moment before saving itself to the DB) would have involved more planning, logic, and major rewrites of the system; so, instead, we bypassed the situation.
2) Well, there's my design above :-)
In point of fact, the MMO engine I'm presently working on uses even more reliance upon in-RAM SQL databases, and (I hope) will be doing so a bit better. However, that system is being built using an Entity-Component-System model, rather than the OOP model that I described above.
If you already are based on an OOP model, shifting to ECS is a pretty paradigm shift and, if you can make OOP work for your purposes, it's probably better to stick with what your team already knows.
*- “the 5 second rule” is a colloquial US “folk belief” that after dropping food on the floor, it's still OK to eat it if you pick it up within 5 seconds.