I have a piece of code that I want to translate into a Pandas UDF in PySpark but I'm having a bit of trouble understanding whether or not you can use conditional statements.
def is_pass_in(df):
x = list(df["string"])
result = []
for i in x:
if "pass" in i:
result.append("YES")
else:
result.append("NO")
df["result"] = result
return df
The code is super simple all I'm trying to do is iterate through a column and in each row contains a sentence. I want to check if the word pass
is in that sentence and if so append that to a list that will later become a column right next to the df["string"]
column. Ive tried to do this using Pandas UDF but the error messages I'm getting are something that I don't understand because I'm new to spark. Could someone point me in the correct direction?
There is no need to use a UDF. This can be done in pyspark as follows. Even in pandas, I would advice you dont do what you have done. use np.where()
df.withColumn('result', when(col('store')=='target','YES').otherwise('NO')).show()