csvapache-spark

Can the Spark warning "CSV header does not conform to the schema" be changed to a fault, through the configuration, and stop the current operation?


This Spark warning is very convenient, at the times we switch from 2022 to 2023 with a new csv file.
It notices that we aren't matching anymore. This warning explained me things and saved me tens of times already.

WARN CSVDataSource: CSV header does not conform to the schema.
Header: VendorID, passenger_count, trip_distance, RatecodeID, ...
Schema: VendorID, store_and_fwd_flag, RatecodeID, PULocationID, ...

Is there a way, through Spark configuration, to ask it to stop the current operation as a faulty one, instead of only emitting a warning?


Solution

  • check this issue i think you can do that by adding

    spark.read.option("enforceSchema", false)