hiveprestotrino

How to delete data physically with Presto/Trino?


In my installation of Presto (358) I've got two working hive connectors:

Everything works fine but when I call DROP (TABLE/SCHEMA) or DELETE FROM, the deletion happens only in metastore and no data is being physically deleted. Applies on both S3 and ABFS.

This becomes quite problematic in case of replacing data:

> DROP TABLE hive.abc; 
-- ok

> CREATE TABLE hive.abc AS (...) 
-- ERROR: Target directory 'abc' already exists.

The same applies on deleting partitions, etc.

Is there any way to really delete the data?


Solution

  • Found out the solution. The main difference was in specifying external_location vs location for the schema and its tables.

    CREATE SCHEMA hive.xyz WITH (location = 'abfs://...');
    CREATE TABLE hive.xyz.test AS SELECT (...);
    
    DELETE FROM hive.xyz.test WHERE TRUE;
    
    -- Data ARE physically deleted
    
    
    CREATE SCHEMA hive.xyz;
    CREATE TABLE hive.xyz.test 
        WITH (external_location = 'abfs://...') 
        AS SELECT (...);
    
    DELETE FROM hive.xyz.test WHERE TRUE;
    
    -- Data ARE NOT physically deleted.
    

    Conclusion: external_location for a table will prevent data deletion.