I'm reading data with the same options multiple times. Is there a way to avoid duplicating common DataFrameReader options and somehow initialize them separately to use them on each read later?
metrics_df = spark.read.format("jdbc") \
.option("driver", self.driver) \
.option("url", self.url) \
.option("user", self.username) \
.option("password", self.password) \
.load()
Define all your options for dataframereader
i.e.<class 'pyspark.sql.readwriter.DataFrameReader'>
then add dbtable option to reuse the dataframereader.
Example:
metrics_df_options = spark.read.format("jdbc") \
.option("driver", self.driver) \
.option("url", self.url) \
.option("user", self.username) \
.option("password", self.password)
type(metrics_df_options)
#<class 'pyspark.sql.readwriter.DataFrameReader'>
#configure dbtable and pull data from rdbms table
metrics_df_options.option("dbtable","<table_name>").load().show()