I understand that when inserting data, tombstones might be created because of existing null values in the columns of the dataframe. To mitigate this issue and minimize tombstones, insertion queries should exclude columns with null values.
Currently, I'm working with the spark-cassandra-connector in pyspark-jupyter notebook environment and I've come across the "com.datastax.spark.connector.types.CassandraOption" trait for scala, How can I leverage this trait or any other method to address the tombstone problem?
WriteConf
has a parameter ignoreNulls
which you can set to true
so that null
values are not inserted when writing to Cassandra.
You can also configure the SparkConf
object by setting the spark.cassandra.output.ignoreNulls
to true
.
For details, see the Globally treating all nulls as Unset section and the Configuration Reference in the docs. Cheers!