dataframescalaapache-sparkdateapache-spark-sql

Spark dataframe string to month


I have a dataframe where one column orddate as string, and I want to extract months from the orddate, make a new column with month name on new df.

|orddate|
|12/1/10 9:37| 
|20/3/10 10:37| 
|09/8/14 4:56| 
|30/12/11 12:13| 
|24/5/10 7:27|

convert to

|orddate| month |
|12/1/10 9:37| january|
|20/3/10 10:37| march |
|09/8/14 4:56| august |
|30/12/11 12:13| december |
|24/5/10 7:27| may |

Solution

  • 1) use unix_timestamp with format dd/MM/yy hh:mm to convert the column to time stamp; 2) use from_unixtime with format MMMMM to convert the time stamp to month;

    You can see more about the format here.

    import org.apache.spark.sql.functions.{from_unixtime, unix_timestamp}
    
    df.withColumn("month", from_unixtime(unix_timestamp($"orddate", "dd/MM/yy hh:mm"), "MMMMM")).show
    
    +--------------+--------+
    |       orddate|   month|
    +--------------+--------+
    |  12/1/10 9:37| January|
    | 20/3/10 10:37|   March|
    |  09/8/14 4:56|  August|
    |30/12/11 12:13|December|
    |  24/5/10 7:27|     May|
    +--------------+--------+