In spark-shell
, how do I load an existing Hive table, but only one of its partitions?
val df = spark.read.format("orc").load("mytable")
I was looking for a way so it only loads one particular partition of this table.
Thanks!
There is no direct way in spark.read.format
but you can use where
condition
val df = spark.read.format("orc").load("mytable").where(yourparitioncolumn)
unless until you perform an action nothing is loaded, since load
(pointing to your orc file location ) is just a func in DataFrameReader
like below it doesnt load until actioned.
see here DataFrameReader
def load(paths: String*): DataFrame = {
...
}
In above code i.e. spark.read
.... where
is just where
condition when you specify this, again data wont be loaded immediately :-)
when you say df.count
then your parition column will be appled on data path of orc.