apache-sparkhadoopapache-spark-sqldata-transfer

Unable to load hdfs file path having -ext-10000 sub directory from spark


I am trying to load data from hive using spark can able to read the data recursively under the directory of dt=2022-10-11, however not able to read from -ext-10000..It is also not showing any error

hadoop fs -ls /user/warehouse/dbA/tableA/dt=2022-10-11/
hadoop fs -ls /user/warehouse/dbA/tableA/dt=2022-10-12/-ext-10000

I have used all the below spark settings to read data from HDFS, using spark 2.3 version:

--conf hive.exec.dynamic.partition=true
--conf hive.exec.dynamic.partition.mode=nonstrict
--conf mapreduce.input.fileinputformat.input.dir.recursive=true
--conf spark.hive.mapred.supports.subdirectories=true
--conf spark.hadoop.hive.supports.subdirectories=true
--conf spark.hadoop.hive.mapred.supports.subdirectories=true
--conf spark.hadoop.hive.input.dir.recursive=true
--conf spark.hadoop.mapreduce.input.fileinputformat.input.dir.recursive=true
--conf hive.exec.compress.output=true

Solution

  • I have added below config and now spark is able to read multiple files under sub directories --conf spark.sql.hive.convertMetastoreOrc=false