hiveparquethadoop-partitioninghive-partitions

Querying based on Partition and non-partition column in Hive


I have an external Hive table as follows :-

CREATE external TABLE sales (
ItemNbr STRING,
itemShippedQty INT,
itemDeptNbr SMALLINT,
gateOutUserId STRING,
code VARCHAR(3),
trackingId STRING,
baseDivCode STRING
)
PARTITIONED BY (countryCode STRING, sourceNbr INT, date STRING)
STORED AS PARQUET
LOCATION '/user/sales/';

where table is partitioned by 3 columns ( countryCode, sourceNbr , date). I know that if i query based on these 3 partition columns, my query would be faster.

I have some queries on other query pattern :-

  1. If i add non-partitioned column along with partitioned column like countryCode, sourceNbr , date , ItemNbr as part of where condition when executing sql query , will it scan the full table or it will scan only inside the folder based on countryCode, sourceNbr , date and look for itemNbr attribute value specified in where condition?

  2. Giving all columns is necessary to filter the record or sub-filter also works like if i give only first 2 columns (countryCode, sourceNbr ) as part of where condition. In this case it would scan the full table or it would search only inside folders based on 2 columns condition (countryCode, sourceNbr ) ?


Solution

  • Partition pruning works in all your cases, no matter all partition columns are in WHERE or only partial, other filters do not affect partition pruning.

    To check it use EXPLAIN EXTENDED command, see https://stackoverflow.com/a/50859735/2700344