node.jsoracle-databasesql-optimizationnode-oracledb

How to update or insert millions of rows via node oracle-db?


I'm struggling with a question - how can I insert or update a lot of data (thousands or millions of rows) using node oracle-db driver?

The point is that I can select a lot of data with the help of resultSet (handling result set)... but then I have to make some actions with a row and later update or insert a new row. And here is the problem - I don't know how to do it as fast as possible.

Can anybody help me with a piece of advice? Thanks.


Solution

  • I can assure you that these actions can't be done in db.

    Actually, there are lots of different ways this can be done in the DB via SQL and PL/SQL when needed. Folks often want to use the language they are comfortable with, maybe JavaScript in this case, but performance will be much better if the data doesn't have to fly around between tiers.

    Here's an example in just SQL alone... Granted, this could have been done via virtual columns, but it should illustrate the point.

    Imagine we have the following tables:

    create table things (
      id   number not null, 
      val1 number not null, 
      val2 number not null, 
      constraint things_pk primary key (id)
    );
    
    insert into things (id, val1, val2) values (1, 1, 2);
    insert into things (id, val1, val2) values (2, 2, 2);
    insert into things (id, val1, val2) values (3, 5, 5);
    
    -- Will hold the sum of things val1 and val2
    create table thing_sums (
      thing_id number,
      sum      number
    );
    
    alter table thing_sums
    add constraint thing_sums_fk1 
    foreign key (thing_id)
    references things (id);
    

    Now, the easiest and most performant way to do this would be via SQL:

    insert into thing_sums (
      thing_id,
      sum
    )
    select id, 
      val1 + val2
    from things
    where id not in (
      select thing_id
      from thing_sums
    );
    

    Here's another example that does the same thing only via PL/SQL which can provide more control.

    begin
    
      -- This cursor for loop will bulk collect (reduces context switching between 
      -- SQL and PL/SQL engines) implictly. 
      for thing_rec in (
        select *
        from things
        where id not in(
          select thing_id
          from thing_sums
        )
      )
      loop
        -- Logic in this loop could be endlessly complex. I'm inserting the values
        -- within the loop but this logic could be modified to store data in arrays
        -- and then insert with forall (another bulk operation) after the loop.
        insert into thing_sums(
          thing_id,
          sum
        ) values (
          thing_rec.id,
          thing_rec.val1 + thing_rec.val2
        );
      end loop;
    
    end;
    

    Either of those could be called from the Node.js driver. However, let's say you need to do this from the driver (maybe you're ingesting data that's not already in the database). Here's an example the demonstrates calling PL/SQL from the driver that uses bulk processing rather than row by row operations. This is much faster due to reduced round trips.

    I pulled this from a blog post I'm working on so the table definition is a little different:

    create table things (
      id   number not null,
      name varchar2(50),
      constraint things_pk primary key (id)
    );
    

    And here's the JavaScript:

    var oracledb = require('oracledb');
    var async = require('async');
    var config = require('./dbconfig');
    var things = [];
    var idx;
    
    function getThings(count) {
      var things = [];
    
      for (idx = 0; idx < count; idx += 1) {
        things[idx] = {
          id: idx,
          name: "Thing number " + idx
        };
      }
    
      return things;
    }
    
    things = getThings(500);
    
    oracledb.getConnection(config, function(err, conn) {
      var ids = [];
      var names = [];
      var start = Date.now();
    
      if (err) {throw err;}
    
      // We need to break up the array of JavaScript objects into arrays that
      // work with node-oracledb bindings.
      for (idx = 0; idx < things.length; idx += 1) {
        ids.push(things[idx].id);
        names.push(things[idx].name);
      }
    
      conn.execute(
        ` declare
            type number_aat is table of number
              index by pls_integer;
            type varchar2_aat is table of varchar2(50)
              index by pls_integer;
    
            l_ids   number_aat := :ids;
            l_names varchar2_aat := :names;
          begin
            forall x in l_ids.first .. l_ids.last
              insert into things (id, name) values (l_ids(x), l_names(x));
          end;`,
        {
          ids: {
            type: oracledb.NUMBER,
            dir: oracledb.BIND_IN,
            val: ids
          }, 
          names: {
            type: oracledb.STRING,
            dir: oracledb.BIND_IN,
            val: names
          }
        },
        {
          autoCommit: true
        },
        function(err) {
          if (err) {console.log(err); return;}
    
          console.log('Success. Inserted ' + things.length + ' rows in ' + (Date.now() - start) + ' ms.');
        }
      );
    });
    

    I hope that helps! :)