I used standard 'cassandra-driver' for access to Cassandra and Scylla for insert/udpate data via BatchStatement. I updated data via batch (~500 rows) and I got only for Cassandra this error (everything work fine for Scylla):
Error from server: code=2200 [Invalid query] message="Batch too large"
Python code, see:
from cassandra import ConsistencyLevel
from cassandra.cluster import Cluster
from cassandra.cluster import ExecutionProfile
from cassandra.cluster import EXEC_PROFILE_DEFAULT
from cassandra.query import BatchStatement
...
cluster = Cluster(contact_points=["localhost"],
port=9042,
execution_profiles={EXEC_PROFILE_DEFAULT: profile},
control_connection_timeout=60,
idle_heartbeat_interval=60,
connect_timeout=60)
session = cluster.connect(keyspace="catalog")
...
insert_statement = session.prepare(f"INSERT INTO rn6.t01 ({columns}) VALUES ({items})")
batch = BatchStatement(consistency_level=ConsistencyLevel.ONE)
data_frm = pandas.DataFrame(generator.integers(999999,
size=(run_setup.bulk_row, run_setup.bulk_col)),
columns=[f"fn{i}" for i in range(run_setup.bulk_col)])
# prepare data
for row in data_frm.values:
batch.add(insert_statement, row)
session.execute(batch)
It seems that Cassandra and Scylla have different default limits for batch statement. Do you know these limits?
Both Scylla and Cassandra have a configurable batch-size limit, batch_size_fail_threshold_in_kb
. But its default value is different: Scylla's default is 1024, whereas Cassandra's default it 50.
You can try increasing it in Cassandra's configuration - or use smaller batches.