node.jsamazon-s3parquetnodejs-polars

Reading parquet file form S3 bucket using nodejs-polars


I’m trying to read a Parquet file using the nodejs-polars library, but I’m encountering a 403 Forbidden response when attempting to load the file from an S3 bucket.

Most of the examples I’ve found are in Python, and I’m looking for guidance on how to achieve this using Node.js. Specifically, I’d like to know:

  1. How to properly read a Parquet file from S3 using nodejs-polars, including how to handle AWS credentials.

  2. Whether it’s possible to use partitioning with nodejs-polars similar to what’s available in the Python implementation.

  3. If hive partitioning is supported, could you provide the valid URL pattern or example configuration for this? As it written here

This is an example of my code:

import pl from 'nodejs-polars';


const cloudOptions = new Map();
cloudOptions.set('aws_region', 'eu-west-1');

// this row returns an error
const df = pl
  .scanParquet(
    'https://my-bucket-name.s3.eu-west-1.amazonaws.com/test_folder/test_file.parquet',
    {
      cloudOptions: cloudOptions,
    },
  )
  .collectSync();

Could someone please provide a working example or any pointers to resolve the issue? Additionally, if partitioning support is available, a brief overview of how to implement it would be greatly appreciated.


Solution

  • I was able to read a Parquet file from an S3 container using nodejs-polars library starting from version 0.16.0.

    Here is a working example:

    import pl from 'nodejs-polars';
    
    // Define your AWS cloud options
    const cloudOptions = {
      aws_region: 'relevant-region',           // Replace with your AWS region
      aws_session_token: 'your-session-token', // Replace with your AWS session token
    };
    
    const df = pl
      .scanParquet(
        's3://your-bucket-name/some-dir/**/**/**/*.parquet', // Update with your actual S3 path
        {
          cloudOptions: cloudOptions,  // Specify AWS options
          hivePartitioning: true,      // Enable hive partitioning if applicable
        }
      )
      .collectSync();
    

    Important Notes: