pythonpandaspysparkgreat-expectations

How do you convert a dataframe to a great_expectations dataset?


I have a pandas or pyspark dataframe df where I want to run an expectation against. I already have my dataframe in memory. How can I convert my dataframe to a great_expectations dataset?

so that i can do for example:

df.expect_column_to_exist("my_column")

Solution

  • import great_expectations as ge
    

    for pandas:

    df_ge = ge.from_pandas(df)
    

    or

    df_ge = ge.dataset.PandasDataset(df)
    

    for pyspark:

    df_ge = ge.dataset.SparkDFDataset(df)
    

    now you can run your expectation

    df_ge.expect_column_to_exist("my_column")
    

    Note that the great_expectations SparkDFDataset does not inherit the functions from the pyspark DataFrame. You can access the original pyspark DataFrame by df_ge.spark_df