dataframeapache-sparkdatepysparkdayofweek

Get weekday name from date in PySpark


I use this code to return the day name from a date of type string:

import Pandas as pd
df = pd.Timestamp("2019-04-10")
print(df.weekday_name)

so when I have "2019-04-10" the code returns "Wednesday"

I would like to apply it a column in PySpark DataFrame to get the day name in text. But it doesn't seem to work.

+-------------+
|Reported Date|
+-------------+
|    1/07/2010|
|    1/07/2010|
|    1/07/2010|
|    1/07/2010|
|    1/07/2010|
|    1/07/2010|
|    1/07/2010|    
+-------------+

I tried to do this:

sparkDF.withColumn("day", weekday_name(pd.Timestamp('Reported Date')))

But I get an Error massage:

NameError: name 'weekday_name' is not defined


Solution

  • PySpark documentation is a bit unclear on this topic but it internally uses Java Date formats.

    you can use like this:

    df.show()
    +----------+
    |      date|
    +----------+
    |2010-01-07|
    +----------+
    
    df.printSchema()
    root
     |-- date: date (nullable = true)
    

    Now, To get the short name of weekday we can use E/EE/EEE and if you want the full name of then to need to give more than 3Es like EEEE

    Short form:

    import pyspark.sql.functions as f
    
    df.withColumn('Day', f.date_format('date', 'E')).show()
    +----------+---+
    |      date|Day|
    +----------+---+
    |2010-01-07|Thu|
    +----------+---+
    

    Full:

    df.withColumn('Day', f.date_format('date', 'EEEE')).show()
    +----------+--------+
    |      date|     Day|
    +----------+--------+
    |2010-01-07|Thursday|
    +----------+--------+