javascriptangularparquetduckdb

Duck DB with angular unable to load and query parquet file


I'm trying to make duckdb-wasm with angular to load a parquet file and query it, I was able to create a connection, but unable to load the parquet file, getting error

Error: Uncaught (in promise): Error: IO Error: No files found that match the pattern "test.parquet" Error: IO Error: No files found that match the pattern "test.parquet" at O.onMessage (duckdb-browser.mjs:1:11169) at _ZoneDelegate.invokeTask (zone.js:402:31) at core.mjs:10757:55 at AsyncStackTaggingZoneSpec.onInvokeTask (core.mjs:10757:36) at _ZoneDelegate.invokeTask (zone.js:401:60) at Object.onInvokeTask (core.mjs:11070:33) at _ZoneDelegate.invokeTask (zone.js:401:60) at Zone.runTask (zone.js:173:47) at ZoneTask.invokeTask [as invoke] (zone.js:483:34) at invokeTask (zone.js:1631:18) at resolvePromise (zone.js:1193:31) at zone.js:1100:17 at zone.js:1116:33 at asyncGeneratorStep (asyncToGenerator.js:6:1) at _throw (asyncToGenerator.js:25:1) at _ZoneDelegate.invoke (zone.js:368:26) at Object.onInvoke (core.mjs:11083:33) at _ZoneDelegate.invoke (zone.js:367:52) at Zone.run (zone.js:129:43) at zone.js:1257:36

i have written the code in a service,

import { Injectable } from '@angular/core';
import * as duckdb from '@duckdb/duckdb-wasm';
@Injectable({
providedIn: 'root',
})
export class DuckdbService {
constructor() {
this.makeConnection();
}
async makeConnection() {
const JSDELIVR_BUNDLES = duckdb.getJsDelivrBundles();
// Select a bundle based on browser checks
const bundle = await duckdb.selectBundle(JSDELIVR_BUNDLES);

const worker_url = URL.createObjectURL(
  new Blob([`importScripts("${bundle.mainWorker}");`], {
    type: 'text/javascript',
  })
);

// Instantiate the asynchronus version of DuckDB-Wasm
const worker = new Worker(worker_url);
const logger = new duckdb.ConsoleLogger();
const db = new duckdb.AsyncDuckDB(logger, worker);
await db.instantiate(bundle.mainModule, bundle.pthreadWorker);
console.log(db);
const conn = await db.connect();
const results = conn.query(`SELECT * FROM read_parquet('test.parquet')`);
console.log(results);
URL.revokeObjectURL(worker_url);
}
}

Please help me with resolving the issue

I'm trying to load and query the parquet file


Solution

  • The command read_parquet('test.parquet'))` will read a virtual file registered in DuckDB with the alias "test.parquet". You have to first register a physical file in the DuckDB virtual file system before you can query it.

    There are a couple different methods of registering a file show below which was copied from https://duckdb.org/docs/api/wasm/data_ingestion.html#parquet.

    // from Parquet files
    // ...Local
    const pickedFile: File = letUserPickFile();
    await db.registerFileHandle('local.parquet', pickedFile, DuckDBDataProtocol.BROWSER_FILEREADER, true);
    // ...Remote
    await db.registerFileURL('remote.parquet', 'https://origin/remote.parquet', DuckDBDataProtocol.HTTP, false);
    // ... Using Fetch
    const res = await fetch('https://origin/remote.parquet');
    await db.registerFileBuffer('buffer.parquet', new Uint8Array(await res.arrayBuffer()));
    
    // ..., by specifying URLs in the SQL text
    await c.query(`
        CREATE TABLE direct AS
            SELECT * FROM 'https://origin/remote.parquet'
    `);
    // ..., or by executing raw insert statements
    await c.query(`
        INSERT INTO existing_table
        VALUES (1, 'foo'), (2, 'bar')`);
    

    As an alternative you can query a remote file with the httpfs extension without registering it as shown at https://duckdb.org/docs/api/wasm/data_ingestion.html#parquet

    const results = conn.query(`SELECT * FROM read_parquet('https://origon/test.parquet')`);