pysparkpalantir-foundryfoundry-code-workbooksfoundry-code-repositories

Pyspark Getting the last date of the previous quarter based on Today's Date


In a code repo, using pyspark, I'm trying to use today's date and based on this I need to retrieve the last day of the prior quarter. This date would be then used to filter out data in a data frame. I was trying to create a dataframe in a code repo and that wasn't working. My code works in Code Workbook. This is my code workbook code.

import datetime as dt
import pyspark.sql.functions as F


def unnamed():
    date_df = spark.createDataFrame([(dt.date.today(),)], ['date'])
    date_df = date_df \
        .withColumn('qtr_start_date', F.date_trunc('quarter', F.col('date'))) \
        .withColumn('qtr_date', F.date_sub(F.col('qtr_start_date'), 1))

    return date_df

Any help would be appreciated.


Solution

  • I got the following code to run successfully in a Code Repository:

    from transforms.api import transform_df, Input, Output
    import datetime as dt
    import pyspark.sql.functions as F
    
    
    @transform_df(
        Output("/my/output/dataset"),
    )
    def my_compute_function(ctx):
        date_df = ctx.spark_session.createDataFrame([(dt.date.today(),)], ['date'])
        date_df = date_df \
            .withColumn('qtr_start_date', F.date_trunc('quarter', F.col('date'))) \
            .withColumn('qtr_date', F.date_sub(F.col('qtr_start_date'), 1))
    
        return date_df
    

    You'll need to pass the ctx argument into your transform, and you can make the pyspark.sql.DataFrame directly using the underlying spark_session variable.

    If you already have the date column available in your input, you'll just need to make sure it's the Date type so that the F.date_trunc call works on the correct type.