apache-sparkpysparktimetimestampleading-zero

Add leading zero to PySpark time components


I have this function which writes data partitioned by date and time

df = df.withColumn("year", F.year(col(date_column))) \
    .withColumn("month", F.month(col(date_column))) \
    .withColumn("day", F.dayofmonth(col(date_column))) \
    .withColumn("hour", F.hour(col(date_column))) 
    
df.write.partitionBy("year","month","day","hour").mode("append").format("csv").save(destination)

The output gets written to month=9 how can I make it be like month=09 same goes for hours, e.g. hour=04.


Solution

  • You could try

    .withColumn("month", F.date_format(col(date_column), "MM")))
    

    and

    .withColumn("hour", F.date_format(col(date_column), "HH"))