trying to unload data from a huge table, below is the command used and output.
$ /home/cassandra/dsbulk-1.8.0/bin/dsbulk unload --driver.auth.provider PlainTextAuthProvider --driver.auth.username xxxx --driver.auth.password xxxx --datastax-java-driver.basic.contact-points 123.123.123.123 -query "select count(*) from sometable with where on clustering column and partial pk -- allow filtering" --connector.name json --driver.protocol.compression LZ4 --connector.json.mode MULTI_DOCUMENT -maxConcurrentFiles 1 -maxRecords -1 -url dsbulk --executor.continuousPaging.enabled false --executor.maxpersecond 2500 --driver.socket.timeout 240000
Setting dsbulk.driver.protocol.compression is deprecated and will be removed in a future release; please configure the driver directly using --datastax-java-driver.advanced.protocol.compression instead.
Setting dsbulk.driver.auth.* is deprecated and will be removed in a future release; please configure the driver directly using --datastax-java-driver.advanced.auth-provider.* instead.
Operation directory: /home/cassandra/logs/COUNT_20210423-070104-108326
total | failed | rows/s | p50ms | p99ms | p999ms
1 | 1 | 0 | 109,790.10 | 110,058.54 | 110,058.54
Operation COUNT_20210423-070104-108326 completed with 1 errors in 1 minute and 50 seconds.
Here are the dsbulk records --
cassandra@somehost> cd logs
cassandra@somehost> cd COUNT_20210423-070104-108326/
cassandra@somehost> ls
operation.log unload-errors.log
cassandra@somehost> cat operation.log
2021-04-23 07:01:04 WARN Setting dsbulk.driver.protocol.compression is deprecated and will be removed in a future release; please configure the driver directly using --datastax-java-driver.advanced.protocol.compression instead.
2021-04-23 07:01:04 WARN Setting dsbulk.driver.auth.* is deprecated and will be removed in a future release; please configure the driver directly using --datastax-java-driver.advanced.auth-provider.* instead.
2021-04-23 07:01:04 INFO Operation directory: /home/cassandra/logs/COUNT_20210423-070104-108326
2021-04-23 07:02:55 WARN Operation COUNT_20210423-070104-108326 completed with 1 errors in 1 minute and 50 seconds.
2021-04-23 07:02:55 INFO Records: total: 1, successful: 0, failed: 1
2021-04-23 07:02:55 INFO Memory usage: used: 212 MB, free: 1,922 MB, allocated: 2,135 MB, available: 27,305 MB, total gc count: 4, total gc time: 98 ms
2021-04-23 07:02:55 INFO Reads: total: 1, successful: 0, failed: 1, in-flight: 0
2021-04-23 07:02:55 INFO Throughput: 0 reads/second
2021-04-23 07:02:55 INFO Latencies: mean 109,790.10, 75p 110,058.54, 99p 110,058.54, 999p 110,058.54 milliseconds
2021-04-23 07:02:58 INFO Final stats:
2021-04-23 07:02:58 INFO Records: total: 1, successful: 0, failed: 1
2021-04-23 07:02:58 INFO Memory usage: used: 251 MB, free: 1,883 MB, allocated: 2,135 MB, available: 27,305 MB, total gc count: 4, total gc time: 98 ms
2021-04-23 07:02:58 INFO Reads: total: 1, successful: 0, failed: 1, in-flight: 0
2021-04-23 07:02:58 INFO Throughput: 0 reads/second
2021-04-23 07:02:58 INFO Latencies: mean 109,790.10, 75p 110,058.54, 99p 110,058.54, 999p 110,058.54 milliseconds
cassandra@somehost> cat unload-errors.log
Statement: com.datastax.oss.driver.internal.core.cql.DefaultBoundStatement@1083fef9 [0 values, idempotence: <UNSET>, CL: <UNSET>, serial CL: <UNSET>, timestamp: <UNSET>, timeout: <UNSET>]
SELECT batch_id from .... allow filtering (Cassandra timeout during read query at consistency LOCAL_ONE (1 responses were required but only 0 replica responded))
at com.datastax.oss.dsbulk.executor.api.subscription.ResultSubscription.toErrorPage(ResultSubscription.java:534)
at com.datastax.oss.dsbulk.executor.api.subscription.ResultSubscription.lambda$fetchNextPage$1(ResultSubscription.java:372)
at com.datastax.oss.driver.internal.core.cql.CqlRequestHandler.setFinalError(CqlRequestHandler.java:447) [4 skipped]
at com.datastax.oss.driver.internal.core.cql.CqlRequestHandler.access$700(CqlRequestHandler.java:94)
at com.datastax.oss.driver.internal.core.cql.CqlRequestHandler$NodeResponseCallback.processRetryVerdict(CqlRequestHandler.java:859)
at com.datastax.oss.driver.internal.core.cql.CqlRequestHandler$NodeResponseCallback.processErrorResponse(CqlRequestHandler.java:828)
at com.datastax.oss.driver.internal.core.cql.CqlRequestHandler$NodeResponseCallback.onResponse(CqlRequestHandler.java:655)
at com.datastax.oss.driver.internal.core.channel.InFlightHandler.channelRead(InFlightHandler.java:257)
at java.lang.Thread.run(Thread.java:748) [24 skipped]
Caused by: com.datastax.oss.driver.api.core.servererrors.ReadTimeoutException: Cassandra timeout during read query at consistency LOCAL_ONE (1 responses were required but only 0 replica responded)
Cassandra's system.log snippet ----
DEBUG [ScheduledTasks:1] 2021-04-23 00:01:48,539 MonitoringTask.java:152 - 1 operations timed out in the last 5015 msecs:
<SELECT * FROM my query being run with limit - LIMIT 5000>, total time 10004 msec, timeout 10000 msec/cross-node
INFO [ScheduledTasks:1] 2021-04-23 00:02:38,540 MessagingService.java:1302 - RANGE_SLICE messages were dropped in last 5000 ms: 0 internal and 1 cross node
. Mean internal dropped latency: 0 ms and Mean cross-node dropped latency: 10299 ms
INFO [ScheduledTasks:1] 2021-04-23 00:02:38,551 StatusLogger.java:114 -
Pool Name Active Pending Completed Blocked All Time Blocked
ReadStage 1 0 1736872997 0 0
ContinuousPagingStage 0 0 586 0 0
RequestResponseStage 0 0 1483193130 0 0
ReadRepairStage 0 0 9079516 0 0
CounterMutationStage 0 0 0 0 0
MutationStage 0 0 351841038 0 0
ViewMutationStage 0 0 0 0 0
CommitLogArchiver 0 0 32961 0 0
MiscStage 0 0 0 0 0
CompactionExecutor 0 0 12034828 0 0
MemtableReclaimMemory 0 0 68612 0 0
PendingRangeCalculator 0 0 9 0 0
AntiCompactionExecutor 0 0 0 0 0
GossipStage 0 0 20137208 0 0
SecondaryIndexManagement 0 0 0 0 0
HintsDispatcher 0 0 3798 0 0
MigrationStage 0 0 8 0 0
MemtablePostFlush 0 0 338955 0 0
PerDiskMemtableFlushWriter_0 0 0 66297 0 0
ValidationExecutor 0 0 247600 0 0
Sampler 0 0 0 0 0
MemtableFlushWriter 0 0 41757 0 0
InternalResponseStage 0 0 525242 0 0
AntiEntropyStage 0 0 767527 0 0
CacheCleanupExecutor 0 0 0 0 0
Native-Transport-Requests 0 0 958717934 0 65
CompactionManager 0 0
MessagingService n/a 0/0
Cache Type Size Capacity KeysToSave
KeyCache 104857216 104857600 all
RowCache 0 0 all
Expand select count(*) from sometable with where on clustering column and partial pk -- allow filtering
with an additional condition on the token ranges, like this: and partial pk token(full_pk) > :start and token(full_pk) <= :end
- in this case, DSBulk will perform many queries against specific token ranges that are sent to multiple nodes, and won't create the load onto the single node as in your case.
Look into the documentation for -query option, and for 4th blog in this series of blog posts about DSBulk, that could provide more information & examples: 1, 2, 3, 4, 5, 6