phpmysqlstoragetrending

What is the best way of storing trend data?


I am currently building an application where I am importing statistical data for (currently) around 15,000 products. At current, if I was to maintain one database table for each day statistics from one source it would be increased by 15,000 rows of data (let's say 5-10 fields per row primarily float, int) per day. Obviously equating to over 5 million records per year into one table.

That doesn't concern me so much as the thought of bringing in data from other sources (and thus increasing the size the database by 5 million records for each new source).

Now the data is statistical / trending based data, and will have basically 1 write per day per record, and many reads. For purposes of on the fly reporting and graphing however I need fast access to subsets of the data based on rules (date ranges, value ranges, etc).

What my question is, is this the best way to store the data (MySQL InnoDb tables), or is there a better way to store and handle statistical/trend data?

Other options I have tossed around at this point: 1. Multiple databases (one per product), with separate tables for each data source within. (ie Database: ProductA, Table(s):Source_A, Source_B, Source_C) 2. One database, multiple tables (one for each product/data source) (ie Database: Products, Table(s): ProductA_SourceA, ProductA_SourceB, etc.) 3. All factual or specific product information in the database and all statistical data in csv, xml, json, (flat files) in separate directories.

So far, none of these options are very manageable, each has its pros and cons. I need a reasonable solution before I move into the alpha stage of development.


Solution

  • You could try making use of a column based database. These kinds of databases are much better at analytical queries of the kind you're describing. There are several options:

    http://en.wikipedia.org/wiki/Column-oriented_DBMS

    We've had good experience with InfiniDB:

    http://infinidb.org/

    and Infobright looks good as well:

    http://www.infobright.com/

    Both InfiniDB and Infobright have free open source community editions, so I would recommend using these to get some benchmarks on the kinds of performance benefit you might get.

    You might also want to look at partitioning your data to improve performance.