hivehdfshiveql

External Table data not getting Purged in Hive


I created 2 external tables Hive. In first table specified data location with create statement. In second table loaded data after creating it. I can see data file created for second table in /hive/warehouse/ directory. Then I set "external.table.purge"="true" for both tables. And DROP both tables. But data files of both tables remains as is.

  1. What is the behaviour of 'external.table.purge'='true'. Shouldn't it delete data files as well on issuing Drop command?
  2. If Hive does not take any ownership over data files of external table, why is there even an option as 'external.table.purge'='true'.

I read in one of the threads, where someone mentioned it is possible to delete data as well for external tables by ALTER TABLE ... SET TBLPROPERTIES('external.table.purge'='true'), but unable to find that post again.

Syntax used to Create External Table.

Code used to set purge option.


Solution

  • You can not drop the data in external table but you can do it for internal(managed) tables. So convert the table to internal and then drop it.
    First change eternal property to false.
    hive> ALTER TABLE nyse_external SET TBLPROPERTIES('EXTERNAL'='False');
    and then you can easily drop it.
    hive> drop table nyse_external;

    TBLPROPERTIES ("external.table.purge"="true") should work for hive version 4.x+.