mysqlpostsforum

Mysql table for forum posts: can size be a problem?


Seeking general thoughts/advise, and links to books/resources are greatly appreciated!

I am building a forum where for each topic there are posts, replies to posts, replies to the replies and so on for many levels... much like a commenting section. At the moment, all posts are in the same table, but I wonder if, as the number of topics increases, I should be concerned by issues stemming from the size of the posts table. The main concern would be a decrease in speed with which posts on a topic are retrieved. The other concern of mine is that errors or malicious hacking could lead to, for example, the wrong posts being displayed on a topic (not all users are supposed to have access to all topics).

The alternative solution would be to create a new table of posts whenever a new topic is created. Perfectly doable, but it would have the disadvantage of creating an ever expanding database, which I presume would be harder to maintain and handle. Am I right about the latter?

In short: is there any point (e.g. expected number of topics/posts) at which size of table becomes an issue? Mainly out of curiosity, but how do giant commenting/social media platforms handle posts?

Posts like What's the ideal database table structure for forum posts and replies? of Table design for a forum don't answer my question. Thanks in advance for any insights!


Solution

  • Indexing is the most effective way you can improve performance of queries. The trick is to avoid a query finding the rows it needs without examining many rows that are not what it needs. You might like my presentation How to Design Indexes, Really or the video.

    Data types can affect optimization too. Use the smallest data type that supports the values you need to store in a column. Use numerics and dates instead of making everything a string. Use a consistent character set and collation for all string columns.

    Archive obsolete data. I can't tell you how many times I've seen a very huge table and if pressed the developers admit that 99% of the data is garbage, but they can't bring themselves to remove it from the table. If you have a table that grows continually, you must implement a strategy to "prune" it periodically. If the data is important for historical purposes, that's fine, you don't have to destroy the data. You could preserve it in some larger, cheaper storage.

    MySQL Server tuning options. This is a large topic and it can take years to learn how to use MySQL options to optimize your workload. There are good books like High Performance MySQL. It'd be easier to just hire a good MySQL consultant to review your server tuning. Be prepared to do this about once a year as you grow.

    Caching is saving frequently-requested data in RAM for quicker access. The most performance SQL query is the one you don't have to run at all, because the data you need is already in memory.

    How large is the maximum table size? See my answer to https://stackoverflow.com/a/2716470/20860

    For what it's worth, at my last job we configured an alert if any single table was over 512GB, or if the total schema was over 1024GB. These were more or less chosen as arbitrary round numbers (at least round numbers to a computer architect). There's no magic failure that occurs at that size, it's just an indicator because we figured if a table grows to half a terabyte, then it will continue to grow in an unbounded way, and it needs to be managed.

    I would not recommend your idea of creating a new table per topic. This leads to runaway growth of the number of tables, and the tables share resources like memory and storage anyway. If a single table is so large that you have performance problems even after using indexes, then it probably won't help to split it into multiple tables.

    Another problem with a table per topic is that they won't be uniform in size. You could still have a single topic-specific table that is larger than you want.

    How do giant social network companies deal with this? They scale out to multiple database servers. This introduces a lot of complexity to your application, but it does allow virtually unlimited growth, because you add more memory, storage, and CPU resources as you add servers. If you're just starting out, you're probably not going to have a need to do this anytime soon.

    If you do plan to grow this site indefinitely, you should be prepared to change optimization strategies from time to time. When I was a MySQL performance consultant, we had a saying that for every 10x growth in scale, you should reevaluate your current architecture and consider changing it. The solutions at different scales may be different.

    You also said you were concerned about errors or hacking. This is another topic, unrelated to performance. I won't go into this in detail, except to give the general advice to use parameterized queries and avoid formatting dynamic queries with string concatenation or variable expansion. Study the guides at https://owasp.org/ for more information.