pythonpython-3.xpyspark

How to write try except for loading data


I'm pretty new to coding so I apologize for this being stupid question. I'm writing a spark function that takes in a file path and file type and creates a dataframe. If the input is invalid, I want to just print some sort of error message and return an empty dataframe. Would I use try except?

def rdf(name, type):
   try:
      df=spark.read.format(type).load(name)
      return df
   except ____ as error:
      print(error)
      return "" #I want to return an empty RDD here, but I can't figure out how to make one

How do I know what goes in the ____? I tried org.apache.spark.SparkException because that's the error I get when I pass in a .csv file as a parquet and it breaks but that isn't working


Solution

  • You can catch multiple exceptions in the try-except block; for instance:

    def rdf(name, type):
       try:
          df=spark.read.format(type).load(name)
          return df
       except (SparkException, TypeError) as error:
          print(error)
          return ""
    

    You could replace or add errors to that tuple.

    Using a Exception will potentially silence errors that are unrelated to your code (like a networking issue if name is an S3 path). That is probably something you want your program to not handle.