I use this code to return the day name from a date of type string:
import Pandas as pd
df = pd.Timestamp("2019-04-10")
print(df.weekday_name)
so when I have "2019-04-10" the code returns "Wednesday"
I would like to apply it a column in PySpark DataFrame to get the day name in text. But it doesn't seem to work.
+-------------+
|Reported Date|
+-------------+
| 1/07/2010|
| 1/07/2010|
| 1/07/2010|
| 1/07/2010|
| 1/07/2010|
| 1/07/2010|
| 1/07/2010|
+-------------+
I tried to do this:
sparkDF.withColumn("day", weekday_name(pd.Timestamp('Reported Date')))
But I get an Error massage:
NameError: name 'weekday_name' is not defined
PySpark documentation is a bit unclear on this topic but it internally uses Java Date formats.
you can use like this:
df.show()
+----------+
| date|
+----------+
|2010-01-07|
+----------+
df.printSchema()
root
|-- date: date (nullable = true)
Now, To get the short name of weekday we can use E/EE/EEE
and if you want the full name of then to need to give more than 3Es like EEEE
Short form:
import pyspark.sql.functions as f
df.withColumn('Day', f.date_format('date', 'E')).show()
+----------+---+
| date|Day|
+----------+---+
|2010-01-07|Thu|
+----------+---+
Full:
df.withColumn('Day', f.date_format('date', 'EEEE')).show()
+----------+--------+
| date| Day|
+----------+--------+
|2010-01-07|Thursday|
+----------+--------+