database-designrelational-databasedatabase-performanceshardinglarge-data-volumes

Asking for suggestions on database table design base on this described scenario


It might be a weird situation, but it just come to my mind...
Imagine I have a database table which takes 1 million new rows everyday. There are 3 columns in the table: id, value, date.

What I would like to do with the rows is to load all rows depending on date.

Here comes to the question:

Given the nature of this table and the way I use it(I only need to grab the list of rows on a specific day), performance wise, does creating a new table with same structure but named with date on daily basis(ie, create table with name 01Jan2014, 02Jan2014, ... each with 1 million records in it) takes advantage over having all rows in one table and date column as index?


Solution

  • There's no need to create multiple tables. You can define one table with Partitioning, so it appears to be one logical whole table, but internally it is stored as multiple physical tables with identical structure.

    CREATE TABLE a_database_table (
     id INT AUTO_INCREMENT,
     date DATE NOT NULL,
     value TEXT,
     PRIMARY KEY (id, date)
    ) PARTITION BY RANGE COLUMNS (date) (
      PARTITION p1 VALUES LESS THAN ('2014-01-01'),
      PARTITION p2 VALUES LESS THAN ('2014-01-10'),
      PARTITION p3 VALUES LESS THAN ('2014-01-20'),
      PARTITION p4 VALUES LESS THAN ('2014-02-01'),
      PARTITION pN VALUES LESS THAN (MAXVALUE)
    );
    

    As the data gets close to the last partition (or even after it starts filling the last partition), you can split it:

    ALTER TABLE a_database_table REORGANIZE PARTITION pN INTO (
      PARTITION p5 VALUES LESS THAN ('2014-02-10'), 
      PARTITION pN VALUES LESS THAN (MAXVALUE)
    );
    

    The advantage of partitioning is that a query against a specific day will "prune" its access to the table so it only reads the one relevant partition. This happens automatically if your query is specific about the day and MySQL can infer which partition contains the rows you're looking for.