In my installation of Presto (358) I've got two working hive connectors:
Everything works fine but when I call DROP (TABLE/SCHEMA)
or DELETE FROM
, the deletion happens only in metastore and no data is being physically deleted. Applies on both S3 and ABFS.
This becomes quite problematic in case of replacing data:
> DROP TABLE hive.abc;
-- ok
> CREATE TABLE hive.abc AS (...)
-- ERROR: Target directory 'abc' already exists.
The same applies on deleting partitions, etc.
Is there any way to really delete the data?
Found out the solution. The main difference was in specifying external_location vs location for the schema and its tables.
CREATE SCHEMA hive.xyz WITH (location = 'abfs://...');
CREATE TABLE hive.xyz.test AS SELECT (...);
DELETE FROM hive.xyz.test WHERE TRUE;
-- Data ARE physically deleted
CREATE SCHEMA hive.xyz;
CREATE TABLE hive.xyz.test
WITH (external_location = 'abfs://...')
AS SELECT (...);
DELETE FROM hive.xyz.test WHERE TRUE;
-- Data ARE NOT physically deleted.
Conclusion: external_location
for a table will prevent data deletion.