I want to create an iceberg table with partition using pyspark dataframe. I see how this can be done using spark sql but not with pyspark dataframes.
https://iceberg.apache.org/docs/1.4.3/spark-ddl/#partitioned-by
Can someone help with this?
Just use method partitionBy
.
df = spark.createDataFrame([(1, "foo"), (2, "bar")], ["key", "value"])
df.write. \
format("iceberg").\
partitionBy("key"). \
mode("append"). \
saveAsTable("catalog_name.namespace.table_name")
# check the DDL:
spark.sql("SHOW CREATE TABLE catalog_name.namespace.table_name").show(1, 1000)
The output will be something like the following:
CREATE TABLE catalog_name.namespace.table_name (\n key BIGINT,\n value STRING)\nUSING iceberg\nPARTITIONED BY (key)...