[SOLVED] Estimate row size HBase/HyperTable

Estimate row size HBase/HyperTable

Is there a way to estimate row size if I know what kind of data I'll be storing (with compression in mind)?

I'm looking at something like

I am trying to find the best DB solution for about 2 trillion records like the one above, combined with about x20 like

bson_id | bson_id

Any other recommendations are welcome

Solution

Sort for very generic answer.

As far as I know, only tests with dummy data is reliable way to measure such thing. “Dummy” here means fake but not repeated, because strong repetition may spoil compression estimates.

For example you may put 1m, 2m, 4m, 8m, 32m, 128m and so on… records and check is there any linear dependency. If it's linear, you can easily with some contingency extrapolate values for billions and trillions of records.

In such tests you also able to check performance against your needs. For example you can increase replication factor of HDFS to improve read performance.

And finally you can check this for compression viewpoint.

Good luck with BigData!