So I have a prebuilt and predeployed set of Python UDFs that I would like to use from within a snowpark program. I defined the UDFs in SQL:
CREATE OR REPLACE FUNCTION dg_utility__field_contains_phone_number(str string, pre_clean boolean)
RETURNS BOOLEAN
LANGUAGE PYTHON
RUNTIME_VERSION = 3.8
HANDLER = 'dg_utility__field_contains_phone_number'
AS
$$
.......udf stuff
$$;
Now I want to leverage those UDFs from snowpark python dsl. Something like:
df = df.withColumn(col, dg_utility__field_contains_phone_number(col))
Is there a way to do this, or am I required to redefine dg_utility__field_contains_phone_number within my snowpark program?
I want to be able to share UDFs between SQL and python. I think I could just use session.sql, but that would require me to generate dynamic SQL against the dataframe I want to apply these functions to and I'm looking to avoid that.
From the documentation for snowpark.functions.call_udf (note that if the UDF already exists, the session.udf.register
line is not needed):
from snowflake.snowpark.types import IntegerType
from snowflake.snowpark.functions import call_udf
udf_def = session.udf.register(lambda x, y: x + y, name="add_columns", input_types=[IntegerType(), IntegerType()], return_type=IntegerType(), replace=True)
df = session.create_dataframe([[1, 2]], schema=["a", "b"])
df.select(call_udf("add_columns", col("a"), col("b"))).show()
ADD_COLUMNS("A", "B") |
---|
3 |