Is there any way to find all the options when reading/writing with Spark for a specific format? I think they must be in the source code somewhere but I can't find it.
Below is my code to use spark to read data from Hbase, it works fine, but I want to know where the options hbase.columns.mapping
and hbase.table
come from. Are there any other options?
val spark = SparkSession.builder().master("local").getOrCreate()
val hbaseConf = HBaseConfiguration.create()
hbaseConf.set("hbase.zookeeper.quorum", "vftsandbox-namenode,vftsandbox-snamenode,vftsandbox-node03")
new HBaseContext(spark.sparkContext, hbaseConf)
val hbaseTable = "mytable"
val columnMapping =
"""id STRING :key,
mycfColumn1 STRING mycf:column1,
mycfColumn2 STRING mycf:column2,
mycfCol1 STRING mycf:col1,
mycfCol3 STRING mycf:col3
"""
val hbaseSource = "org.apache.hadoop.hbase.spark"
val hbaseDF = spark.read.format(hbaseSource)
.option("hbase.columns.mapping", columnMapping)
.option("hbase.table", hbaseTable)
.load()
hbaseDF.show()
I mean if it's format(csv)
or format(json)
then there are some docs on the internet with all the options, but for this specific format (org.apache.hadoop.hbase.spark
), I have no luck. Even with the case of csv or json, all the options on the internet must come from the code, right? They can't just imagine it out.
Now I think the problem is "how to find all the spark options in the source code in general". I try using IntelliJ Idea search tool to search from all places (even in the source code libraries) but no luck so far. Can't find anything related to hbase.columns.mapping
or hbase.table
at all (already tried hbase_columns_mapping
too), there are no thing related in org.apache.hadoop.hbase.spark
either, there are only instances in my code.
I also find these lines in the console after running the code. But the HbaseRelation
class is some "decompiled" class with all the ???
17:53:51.205 [main] DEBUG org.apache.spark.util.ClosureCleaner - HBaseRelation(Map(hbase.columns.mapping -> id STRING :key,
mycfColumn1 STRING mycf:column1,
mycfColumn2 STRING mycf:column2,
mycfCol1 STRING mycf:col1,
mycfCol3 STRING mycf:col3
, hbase.table -> mytable),None)
I think there are some possibilities that it only appears at runtime/compile-time but I'm not sure
Because non-built-in formats implemented in arbitrary code, there is no certain way of finding the options other than going through the available documentation and source code unfortunately.
For example, do the steps below to find the HBase Connector options.
hbase.spark.pushdown.columnfilter
; find out where they are defined in the repository. In this case it's defined in the HBaseSparkConf object.Also, please note that writing and reading operations may have different sets of options.