hivedatabricksdelta-lakebloom-filter

Bloom Filter Index on Delta Table


Is is possible to create a Bloom filter index in Databricks on a Delta table using the filepath and not on the hive table referencing to that file location?

I tried the following:

CREATE BLOOMFILTER INDEX 
ON TABLE delta.'gs://GCS_Bucket/Delta_Folder_Path'
FOR COLUMNS(colname OPTIONS(fpp=0.1, numItems=100))

But it doesn't work. I get the following error:

ParseException: 
no viable alternative at input 'CREATE BLOOMFILTER'(line 1, pos 7)

== SQL ==
CREATE BLOOMFILTER INDEX 
-------^^^
ON TABLE delta.'gs://GCS_Bucket/Delta_Folder_Path'
FOR COLUMNS(LOT_W OPTIONS(fpp=0.1, numItems=100))

Replacing the delta.'gs://GCS_Bucket/Delta_Folder_Path' with a hive external table that references the file works as expected.

All the examples that I found first create a table of it, and then create the bloom filter index. But this is not what we want.

We only want to have tables that are in the gold layer and some in silver available in hive. The table that I want to add a bloom filter index on should not be in hive.

Edit: This is on Databricks runtime 10.4 LTS


Solution

  • Most probably error is arising because you're using ordinary quotes instead of backquotes for path (doc). Try:

    CREATE BLOOMFILTER INDEX 
    ON TABLE delta.`gs://GCS_Bucket/Delta_Folder_Path`
    FOR COLUMNS(colname OPTIONS(fpp=0.1, numItems=100))
    

    P.S. Error message points to the incorrect place, I think that it's known issue