analytics360-degreesapache-kudu

Any suggestions for analytical columnar DB which can be modified?


I need to build a customer 360 degree database, which requires:

For these requirements, I think an modifiable columnar DB would be a perfect fit: it can be queried and aggregated by columns which is optimal for analytics, it can be updated for several million changes throughout the day. The most identical project I have found is Apache Kudu, but its limitation of 300 columns is a big turn-off, we have more than 1000.

And we prefer a open-source project.

Any suggestions ?


Solution

  • I will answer my own question, since our solution works fine now.

    Instead of having a unified DB for both analytics and OLTP workload, we separate the workload into 2: analytics workload will be served by Parquet tables in HDFS, and OLTP one will be served by HBase.

    Of course we have to duplicate (part of) the customer data, but with a not-so-much cost of storage and computing capacity that we are willing to pay.