pysparkjdbcdate-formatlower-boundupperbound

How to set upperBound and lowerBound format for reading JDBC with pyspark


Trying to read from JDBC with pyspark. In JDBC there is a column FEC_PART as a date type with the format yyyymmdd. For reading the parameter upperBound or lowerBound doesn't match the desire format yyyymmdd:

The error appear when read.load() with spark.

input_data = spark.read \
    .format(constants.FORMAT_JDBC) \
    .options(**properties) \
    .option("partitionColumn", "FEC_PART")  # Keep partition column as it is
    .option("lowerBound", "20200112")  # Use the yyyymmdd format for bounds to match partitionColumn
    .option("upperBound", "20200114")  # Use the yyyymmdd format for bounds to match partitionColumn
    .option("numPartitions", "2") \
    .load()

1st approach. Trying adding this options:

.option("oracle.jdbc.mapDateToTimestamp", "false")
.option("sessionInitStatement", "ALTER SESSION SET NLS_DATE_FORMAT = 'YYYYMMDD'")

2nd approach with this other option:

.option("dateFormat", "yyyyMMdd")

And some more approach without any result.


Solution

  • It works for me with a very similar to your approach 1), but changing the date format, from YYYYMMDD to: YYYY-MM-DD, so:

    .option("sessionInitStatement", "ALTER SESSION SET NLS_DATE_FORMAT = 'YYYY-MM-DD'")
    

    And then passing the literals as:

    --jdbc.bigquery.input.lowerbound="2024-03-15"
    --jdbc.bigquery.input.upperbound="2024-03-19"