azureapache-sparkazure-synapseapache-hudiapache-spark-xml

Issues while writing xml data to hudi table in azure synapse notebook


I've successfully read blob data (XML) from container in azure synapse notebook and displayed dataframe df as per my need however while writing it as hudi table in azure data lake storage Gen2 I've facing below error. TypeError: DataFrameWriter.option() got an unexpected keyword argument 'hoodie.datasource.write.recordkey.field'.

Below are some configuration I'm using

basepath = "abfs://XXXXXXXXXXXXXXXXXXXXXXXX/huditables/"
table="hudiTable"
hudi_options = {
   'hoodie.datasource.write.recordkey.field': 'id',
   'hoodie.datasource.write.operation': 'upsert',
   'hoodie.datasource.write.precombine.field': 'id',
   'hoodie.table.name': table
}

// loaded the xml file and filled dataframe df here.

df.write.format("hudi").option(**hudi_options).mode("overwrite").save(basepath)

I'm uploaded jar packages spark-xml 2.12 - 0.12.0 & hudi-spark 2.12 - 0.15.0 for Apache Spark 3.3 & scala 2.12.15 in azure pool I've created.


Solution

  • I've resolved this issue after spending hours on it. below line df.write.format("hudi").option(**hudi_options).mode("overwrite").save(basepath)

    should be replaced with df.write.format("hudi").options(**hudi_options).mode("overwrite").save(basepath) here option is replaced by options.