sqlout-of-memoryduckdb

DuckDB batch Exit on OOM


I have a batch DuckDB script which sets a custom memory_limit. But when it OOMs on one statement, instead of exiting, it seems to continue and try the rest of the statements in the batch script (which all fail or make no sense because of the failed statement). This leaves very confusing messages in my log file and potentially could lead to errors if say it failed on an UPDATE command, but then continued to the rest of the file.

Is there a way to force DuckDB to exit immediately if a statement failed (ex. by OOM as above)?

Example script:

SET memory_limit = '100GB';
SET threads TO 64;

CREATE TABLE summary AS
  SELECT
    point_id,
    COUNT(pred_occ) AS ensemble_support,
    AVG(pred_range) AS pat_mean,
    QUANTILE_CONT(pred_occ, [0.1, 0.5, 0.9]) AS occurrence_quantiles,
    QUANTILE_CONT(pred_count, [0.1, 0.5, 0.9]) AS count_quantiles,
    QUANTILE_CONT(pred_occ * pred_count, [0.1, 0.5, 0.9]) AS abundance_quantiles
  FROM predictions
  GROUP BY point_id;

CREATE TABLE erd AS
  SELECT
    CAST(SPLIT_PART(point_id, '-', 2) AS BIGINT) AS checklist_id,
    SPLIT_PART(point_id, '-', 1) AS type,
    ensemble_support, pat_mean,
    occurrence_quantiles[2] AS occurrence_median,
    occurrence_quantiles[1] AS occurrence_lower,
    occurrence_quantiles[3] AS occurrence_upper,
    count_quantiles[2] AS count_median,
    count_quantiles[1] AS count_lower,
    count_quantiles[3] AS count_upper,
    abundance_quantiles[2] AS abundance_median,
    abundance_quantiles[1] AS abundance_lower,
    abundance_quantiles[3] AS abundance_upper
  FROM summary
  WHERE point_id LIKE 'test-%';

COPY (
  SELECT
    s.*,
    e.latitude, e.longitude, e.year, e.day_of_year, e.observer_id
  FROM erd as s
  INNER JOIN '{input_erd_pq}' as e
    ON s.checklist_id = e.checklist_id
) TO '{predictions_erd_pq}' (FORMAT 'parquet');

COPY (
  SELECT
    CAST(SPLIT_PART(point_id, '-', 2) AS BIGINT) AS srd_id,
    CAST(NULLIF(SPLIT_PART(point_id, '-', 3), '') AS INTEGER) AS day_of_year,
    ensemble_support, pat_mean,
    occurrence_quantiles[2] AS occurrence_median,
    occurrence_quantiles[1] AS occurrence_lower,
    occurrence_quantiles[3] AS occurrence_upper,
    count_quantiles[2] AS count_median,
    count_quantiles[1] AS count_lower,
    count_quantiles[3] AS count_upper,
    abundance_quantiles[2] AS abundance_median,
    abundance_quantiles[1] AS abundance_lower,
    abundance_quantiles[3] AS abundance_upper
  FROM summary
  WHERE point_id LIKE 'srd-%') TO '{predictions_srd_pq}' (FORMAT 'parquet');

Error log:

Out of Memory Error: Failed to allocate block of 2048 bytes (bad allocation)
Catalog Error: Table with name summary does not exist!
Did you mean "temp.information_schema.schemata"?
LINE 15:   FROM summary
                ^
Catalog Error: Table with name erd does not exist!
Did you mean "temp.information_schema.tables"?
LINE 5:   FROM erd as s
               ^
Catalog Error: Table with name summary does not exist!
Did you mean "temp.information_schema.schemata"?
LINE 15:   FROM summary
                ^

Which looks to me like it has tried to run all 4 commands even though every previous command failed!


Solution

  • Use the -bail flag in the duckdb CLI.

    $ duckdb -help 2>&1 | grep bail
       -bail                stop after hitting an error
    

    Tested on DuckDB 1.0.0 with a.sql:

    set memory_limit = '1mb';
    
    select 1;
    
    select * from range(1000) a, range(1000) b order by a.range + b.range;
    
    select 2;
    

    (using -csv only for terse output)

    $ duckdb -csv < a.sql
    1
    1
    Out of Memory Error: could not allocate block of size 256.0 KiB (784.0 KiB/976.5 KiB used)
    2
    2
    $ duckdb -csv -bail < a.sql
    1
    1
    Out of Memory Error: could not allocate block of size 256.0 KiB (784.0 KiB/976.5 KiB used)