amazon-s3openrowsetsql-server-2022

Query Amazon S3 file via OPENROWSET with SQL Server 2022


I'm trying to query a parquet file uploaded to Amazon S3 from SQL Server 2022 using OPENROWSET.

I can confirm that I can access the S3 bucket with those access keys because I ran a python script to generate and upload the parquet file to S3 successfully.

I can verify the file is present in S3 by logging into AWS:

enter image description here

When I try to query the file from SQL, I get this error:

File 's3://sqlexternaltabletest/parquet_folder/1936a792b4f94a9283e8f22354171fd7-0.parquet' cannot be opened because it does not exist or it is used by another process

S3 bucket name = "sqlexternaltabletest"

Here's my SQL code:

CREATE DATABASE SCOPED CREDENTIAL s3_dc
WITH IDENTITY = 'S3 Access Key',
SECRET = 'AccessKey:SecretKey';

CREATE EXTERNAL DATA SOURCE s3_ds
WITH
(
    LOCATION = 's3://s3.amazonaws.com/sqlexternaltabletest/'
,   CREDENTIAL = s3_dc
);
GO

SELECT *
FROM OPENROWSET
(
    BULK '/parquet_folder/1936a792b4f94a9283e8f22354171fd7-0.parquet'
,   FORMAT       = 'PARQUET'
,   DATA_SOURCE  = 's3_ds'
) AS [cc];

Any idea what's wrong?


Solution

  • I was having the same issue, but did not have the region in the URL. I notice your URL has the same issue.

    Make sure you have the region in your s3 location like this:

    CREATE EXTERNAL DATA SOURCE s3_ds
    WITH
    (    LOCATION = 's3://some-bucket.s3.us-west-2.amazonaws.com/prefix',
    ,    CREDENTIAL = s3_dc
    )