I have coded a pyspark script to execute a SQL file, it worked perfectly fine on the spark latest version, but the target machine has 2.3.1, and it throws exception:
pyspark.sql.utils.AnalysisException: u"Undefined function: 'array_remove'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'
It seems these are not present in the older versions :( can anyone suggest something, i have searched alot but in vain.
my sql piece which is failing is
SELECT NIEDC.*, array_join(array_remove(split(groupedNIEDC.appearedIn,'-'), StudyCode),'-') AS subjects_assigned_to_other_studies
array_remove and array_join functions were added on spark version 2.4. You can make an UDF and register it to use in a query using this method.