amazon-web-servicesamazon-s3prestotrinobackblaze

Can't set catalog to backblaze in Trino


I'm trying to run a POC on getting trino querying data into backblaze.

Following this example: https://github.com/bitsondatadev/trino-getting-started/blob/main/hive/trino-b2/README.md got me to the following issue:

SQL Error [16777216]: Query failed (#20240531_002231_00032_e9wd4): Got exception: org.apache.hadoop.fs.s3a.AWSBadRequestException getFileStatus on s3://temporalTests/data/data.csv: com.amazonaws.services.s3.model.AmazonS3Exception:  (Service: Amazon S3; Status Code: 400; Error Code: 400 ; Request ID: f1a8d38c4e5afd88; S3 Extended Request ID: adUFuAGsobtRvT3evbss=; Proxy: null), S3 Extended Request ID: adUFuAGsobtRvT3evbss=:400 :  (Service: Amazon S3; Status Code: 400; Error Code: 400 ; Request ID: f1a8d38c4e5afd88; S3 Extended Request ID: adUFuAGsobtRvT3evbss=; Proxy: null)

I am able to create the catalog through the sql console and the schema but it throws the error when creating the table.

CREATE CATALOG backblaze_catalog USING hive
WITH (
    "hive.metastore.uri" = 'thrift://hive-metastore:9083', -- hive metastore created in other container
    "hive.s3.aws-access-key" = 'KeyID',
    "hive.s3.aws-secret-key" = 'AppKeyId',
    "hive.s3.endpoint" = 'https://s3.us-west-123.backblazeb2.com',
    "hive.s3.path-style-access"='true',
    "hive.s3.region" = 'us-west-000',
    "hive.non-managed-table-writes-enabled" = 'true',
    "hive.storage-format" = 'CSV'
);

CREATE SCHEMA backblaze_catalog.raw_data
WITH (
    "location" = 's3a://temporalTests/'
);

CREATE TABLE backblaze_catalog.raw_data.sample_data (
    domain VARCHAR
)
WITH (
    format = 'CSV',
    external_location = 's3a://temporalTests/data/data.csv',
    skip_header_line_count = 1
);

I've managed to test:

Is it something wrong I'm doing?

Thanks.


Solution

  • Your table definition references a file in B2 - it should reference a 'directory' (technically, a prefix, since directories don't exist in cloud object storage). Remove data.csv from the table definition and it should work:

    CREATE TABLE backblaze_catalog.raw_data.sample_data (
        domain VARCHAR
    )
    WITH (
        format = 'CSV',
        external_location = 's3a://temporalTests/data/',
        skip_header_line_count = 1
    );