hadoophivehiveqlapache-hive

Can not contact a hive table partition, after delete hdfs file related to partition


My Hadoop Cluster works batch job for every data at 11:00.

The job creates hive table partition(ex. p_date=201702,p_domain=0) and import rdbms data to the hive table partition like ETL....(hive table is not external table)

but the job has failed, and i removed some hdfs file(the partition location => p_date=20170228,p_domain=0) for reprocess.

It is my mistake, i just a typing query for drop partition at beeline...

And i contact a hang when i query this way "select * from table_name where p_date=20170228,p_domain=0", But "select * from table_name where p_date=20170228,p_domain=6" is success.

I can not find a error log and console message is not appear

How can i solve this problem?

And i hope you understand my lack of english.


Solution

  • You should not delete your partitions in Hive table in that way. There is a special command for doing this:

    ALTER TABLE table_name DROP IF EXISTS PARTITION(partitioncolumn= 'somevalue');

    Deleteing the files from HDFS is not sufficient. You need to clean the data from the metastore. For this you need to connect to you relational db and remove the data from partition-related table in MetaStore database.

    mysql
    
    mysql> use hive;
    
    mysql> SELECT PART_ID PARTITIONS WHERE PART_NAME like '%p_date=20170228,p_domain=0%'
    
    +---------+-------------+------------------+--------------------+-------+--------+
    | PART_ID | CREATE_TIME | LAST_ACCESS_TIME | PART_NAME          | SD_ID | TBL_ID |
    +---------+-------------+------------------+--------------------+-------+--------+
    |       7 |  1487237959 |                0 | partition name     |   336 |    329 |
    +---------+-------------+------------------+--------------------+-------+--------+
    
    
    mysql> DELETE FROM PARTITIONS WHERE PART_ID=7;
    
    mysql> DELETE FROM PARTITION_KEY_VALS WHERE PART_ID=7;
    
    mysql> DELETE FROM PARTITION_PARAMS WHERE PART_ID=7;
    

    After this Hive should stop using this partition in your queries.