I have a pandas or pyspark dataframe df
where I want to run an expectation against.
I already have my dataframe in memory. How can I convert my dataframe to a great_expectations dataset?
so that i can do for example:
df.expect_column_to_exist("my_column")
import great_expectations as ge
for pandas:
df_ge = ge.from_pandas(df)
or
df_ge = ge.dataset.PandasDataset(df)
for pyspark:
df_ge = ge.dataset.SparkDFDataset(df)
now you can run your expectation
df_ge.expect_column_to_exist("my_column")
Note that the great_expectations SparkDFDataset does not inherit the functions from the pyspark DataFrame. You can access the original pyspark DataFrame by
df_ge.spark_df