cassandradatastaxdsbulk

I am getting a heap memory issue while running DSBULK load


I have unloaded more than 100 CSV files in a folder. When I try to load those files to cassandra using DSBULK load and specifying the the folder location of all these files, I get the below error

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "unVocity-parsers input reading thread"

I wanted to see if anyone else has faced it and how it has been resolved.


Solution

  • Here are a few things you can try:

    1. You can pass any JVM option or system property to the dsbulk executable using the DSBULK_JAVA_OPTS env var. See this page for more. Set the allocated memory to a higher value if possible.
    2. You can throttle dsbulk using the -maxConcurrentQueries option. Start with -maxConcurrentQueries 1; then raise the value to get the best throughput possible without hitting the OOM error. More on this here.