I have a dataframe where one column orddate
as string, and I want to extract months from the orddate
, make a new column with month
name on new df.
|orddate|
|12/1/10 9:37|
|20/3/10 10:37|
|09/8/14 4:56|
|30/12/11 12:13|
|24/5/10 7:27|
convert to
|orddate| month |
|12/1/10 9:37| january|
|20/3/10 10:37| march |
|09/8/14 4:56| august |
|30/12/11 12:13| december |
|24/5/10 7:27| may |
1) use unix_timestamp
with format dd/MM/yy hh:mm
to convert the column to time stamp;
2) use from_unixtime
with format MMMMM
to convert the time stamp to month
;
You can see more about the format here.
import org.apache.spark.sql.functions.{from_unixtime, unix_timestamp}
df.withColumn("month", from_unixtime(unix_timestamp($"orddate", "dd/MM/yy hh:mm"), "MMMMM")).show
+--------------+--------+
| orddate| month|
+--------------+--------+
| 12/1/10 9:37| January|
| 20/3/10 10:37| March|
| 09/8/14 4:56| August|
|30/12/11 12:13|December|
| 24/5/10 7:27| May|
+--------------+--------+