cassandracassandra-3.0bulk-load

Looking for Cassandra BulkLoader example Java code to upload SSTable to cluster


I am trying to upload a csv data file to cassandra cluster. This should be a continuous process for which I am creating a simple java app that will read the csv file and then convert it to SSTable and then upload it to cassandra cluster.

I am able to achieve the first step using CQLSSTableWriter and was able to create a local SSTable data. From what I searched, I understood that we have to use BulkLoader give by apache.cassandra.tools to upload the SSTable to cluster. I couldn't figure that part. Also, my SSTable copy will be in local and not in the server where cluster is running. Can someone help me on how to achieve it with an example if possible, that will really help.

To add : My actual use case is to archive data from Sybase to cassandra continuously for which I am trying to create a csv of Sybase data and upload that to Sybase, as the data will be in millions. Any other approaches are also welcome here.


Solution

  • The bulk loader you're referring to is the sstableloader utility that is available in the tools/bin/ directory of your Cassandra installation. The sstableloader utility streams the SSTables to load their contents to a Cassandra cluster.

    However, your approach is inefficient because it's not necessary to convert the CSV data into SSTables.

    The DataStax Bulk Loader tool (DSBulk) was written specifically for this purpose. It allows you to bulk load data in CSV or JSON format to Cassandra. You can also use DSBulk to export data from Cassandra to CSV or JSON.

    Here are some references with examples to help you get started quickly:

    DSBulk is fully open-source so it's free to use. Cheers!