apache-sparkdatepysparkapache-spark-sqlspark3

Convert date to ISO week date in Spark


Having dates in one column, how to create a column containing ISO week date?

ISO week date is composed of year, week number and weekday.

Example dataframe:

from pyspark.sql import SparkSession, functions as F
spark = SparkSession.builder.getOrCreate()
df = spark.createDataFrame([
    ('1977-12-31',),
    ('1978-01-01',),
    ('1978-01-02',),
    ('1978-12-31',),
    ('1979-01-01',),
    ('1979-12-30',),
    ('1979-12-31',),
    ('1980-01-01',)],
    ['my_date']
).select(F.col('my_date').cast('date'))

df.show()
#+----------+
#|   my_date|
#+----------+
#|1977-12-31|
#|1978-01-01|
#|1978-01-02|
#|1978-12-31|
#|1979-01-01|
#|1979-12-30|
#|1979-12-31|
#|1980-01-01|
#+----------+

Desired result:

+----------+-------------+
|   my_date|iso_week_date|
+----------+-------------+
|1977-12-31|   1977-W52-6|
|1978-01-01|   1977-W52-7|
|1978-01-02|   1978-W01-1|
|1978-12-31|   1978-W52-7|
|1979-01-01|   1979-W01-1|
|1979-12-30|   1979-W52-7|
|1979-12-31|   1980-W01-1|
|1980-01-01|   1980-W01-2|
+----------+-------------+

Solution

  • Spark SQL extract makes this much easier.

    So, building off of the other answers with the use of concat_ws:

    import pyspark.sql.functions as F
    
    df.withColumn(
        'iso_week_date',
        F.concat_ws(
            "-",
            F.expr("EXTRACT(YEAROFWEEK FROM my_date)"),
            F.lpad(F.weekofyear('my_date'), 3, "W0"),
            F.expr("EXTRACT(DAYOFWEEK_ISO FROM my_date)")
        )
    ).show()
    
    #+----------+-------------+
    #|   my_date|iso_week_date|
    #+----------+-------------+
    #|1977-12-31|   1977-W52-6|
    #|1978-01-01|   1977-W52-7|
    #|1978-01-02|   1978-W01-1|
    #|1978-12-31|   1978-W52-7|
    #|1979-01-01|   1979-W01-1|
    #|1979-12-30|   1979-W52-7|
    #|1979-12-31|   1980-W01-1|
    #|1980-01-01|   1980-W01-2|
    #+----------+-------------+