oracle-databasebloboracle-call-interface

Oracle SQL: most efficient way (in terms of network roundtrips) to do random little and scattered writes to a LOB column


I have a table with a BLOB column. Each BLOB can be up to 1GB in size. Updates of existing persistent BLOBs usually consist of thousands (~10k) of write microchunks at random offsets of around 100 bytes each, and so the probability of these writes being contiguous is very low.

We have to device a way grouping all of our writes and send them to Oracle in a way that is efficient for both the server (I/O) and client (amount of network-roundtrips). I think it can be done with OCILobWrite2 but I'm not sure how:

NOTE: I'm aware OCI have a buffering mode for LOBs, but it's deprecated/unmaintained.


Solution

  • Securefile LOBs do have some performance enhancements over the old basicfile LOBs, principally the Write Gather Cache (WGC) which buffers writes and flushes periodically to reduce write I/O to disk. So it is not nearly as bad as writing one chunk (a multiple of your block size) for every write operation, even though the chunk remains the basic unit of I/O to disk for LOB segments.

    I did some experimenting comparing 10,000 random 100 byte writes (1MB total) to a 16MB BLOB, with 10,000 random row updates to a RAW(100) column to compare the performance between using a BLOB and using a child table of RAW(100).

    LOB setup:

     drop table lobtest;
     create table lobtest (myblob blob);
    
    /*
    ** Initial population
    */
    DECLARE
      var_myblob blob;
      var_mybuffer raw(100) := HEXTORAW('7785069247a730153313c7c8244db9228b0ce07a2aa8646cf1bc908bc7a275913c16a0081594f1b3979a515b7c37b61993e17caaccbe1401bd52ca34bf21d7a99b0c8220eae182a0f7ec71db249da8d96a585cf8fa81af6d09129d82afaafd7d60d77e4c');
      var_offset pls_integer;
      var_lob_size integer;
    BEGIN
      INSERT INTO lobtest VALUES (EMPTY_BLOB()) RETURNING myblob INTO var_myblob;
      
      FOR i IN 1..100000
      LOOP
        var_offset := ((i-1)*100)+1;
        dbms_lob.write(var_myblob,100,var_offset,var_mybuffer);
      END LOOP;
    END;
    /
    /*
    ** Show LOB segment size
    */
    SELECT ROUND(s.bytes/(1024*1024)) blob_mb,l.*
      FROM user_lobs l,
           user_segments s
     WHERE l.column_name = 'MYBLOB'
       AND l.segment_name = s.segment_name;  
    /     
    

    Result: 16MB

    LOB update test

    /*
    ** Capture session stats before
    */
    drop table stat$before;
    create table stat$before as 
    SELECT sn.name,ss.value
      FROM v$mystat ss,
           v$statname sn
     WHERE ss.statistic# = sn.statistic#       
       AND (sn.name IN ('physical writes direct (lob)',
                       'physical write bytes',
                       'physical read bytes',
                       'physical write total bytes',
                       'redo size',
                       'undo change vector size')
            OR sn.name LIKE 'securefile%')                   
     ORDER BY ss.value DESC               
     /
    /*
    ** Do 10,000 random writes to the BLOB created above in a single transaction
    */
    DECLARE
      var_myblob blob;
      var_mybuffer raw(100) := HEXTORAW('7785069247a730153313c7c8244db9228b0ce07a2aa8646cf1bc908bc7a275913c16a0081594f1b3979a515b7c37b61993e17caaccbe1401bd52ca34bf21d7a99b0c8220eae182a0f7ec71db249da8d96a585cf8fa81af6d09129d82afaafd7d60d77e4c');
      var_offset pls_integer;
      var_lob_size integer;
      var_start_ts timestamp with time zone;
    BEGIN
      var_start_ts := SYSTIMESTAMP;
      
      SELECT myblob INTO var_myblob FROM lobtest WHERE ROWNUM = 1 FOR UPDATE;
        
      FOR i IN 1..10000
      LOOP
        var_offset := LEAST(FLOOR(ABS(dbms_random.random())/10000)*100,16000000)+1;
        dbms_lob.write(var_myblob,100,var_offset,var_mybuffer);
      END LOOP;
      
      COMMIT;
      
      dbms_output.put_line(SYSTIMESTAMP - var_start_ts);
    END;
    /
    
    /*
    ** Capture session stats after and calculate diffs
    */
    SELECT sn.name,ss.value - NVL(b.value,0) diff
      FROM v$mystat ss,
           v$statname sn,
           stat$before b
     WHERE ss.statistic# = sn.statistic#       
       AND (sn.name IN ('physical writes direct (lob)',
                       'physical write bytes',
                       'physical read bytes',
                       'physical write total bytes',
                       'redo size',
                       'undo change vector size')
            OR sn.name LIKE 'securefile%')  
       AND sn.name = b.name(+)
     ORDER BY diff DESC
     /
    

    Result:

    enter image description here

    This was after running the above update test multiple times. Each time I ran it, the numbers dropped, showing the effect of the Write Gather Cache. These numbers show where they bottomed out, at their most efficient. The test did 1MB of updates (securefile bytes non-transformed = 1000000) using 10,000 separate write operations (securefile number of non-transformed flushes = 10000). It read 15MB (physical read bytes), generated 10MB of redo to the log (redo size), and 4MB of undo (undo change vector size). The Write Gather Cache was flushed 3,565 times (securefile number of flushes), or about 1 out of every 3 writes. The counters for physical write bytes/physical write total bytes appear to not be instrumented properly - 8192 is exactly one block/chunk in my case and there's no way it only wrote 1 block when it did 3,565 flushes. It appears the Write Gather Cache masks the real amount of disk write happening. If we assume each flush is a chunk, then we might suspect a total I/O of 29MB (3565 * 8192). v$segstat gives the figure 894 physical write operations (894 * 8192 = 7MB). Either way, we're doing a lot more than 1MB of writes, but the caching has prevented us from having to write 81MB like we would (presumably) with basicfile (10000 * 8192). Total time for the test: 3.3 seconds.

    RAW(100) setup

     drop table lobtest2;
     create table lobtest2 (offset integer, blob_data raw(100));
     
     
    DECLARE
      var_mybuffer raw(100) := HEXTORAW('7785069247a730153313c7c8244db9228b0ce07a2aa8646cf1bc908bc7a275913c16a0081594f1b3979a515b7c37b61993e17caaccbe1401bd52ca34bf21d7a99b0c8220eae182a0f7ec71db249da8d96a585cf8fa81af6d09129d82afaafd7d60d77e4c');
      var_offset pls_integer;
    BEGIN
      FOR i IN 1..100000
      LOOP
        var_offset := ((i-1)*100)+1;
        INSERT INTO lobtest2 VALUES (var_offset,var_mybuffer);
      END LOOP;
      COMMIT;
    END;
    / 
    create index i_lobtest2 on lobtest2(offset);
    /
    

    RAW(100) update test

    drop table stat$before;
    create table stat$before as 
    SELECT sn.name,ss.value
      FROM v$mystat ss,
           v$statname sn
     WHERE ss.statistic# = sn.statistic#       
       AND (sn.name IN ('physical writes direct (lob)',
                       'physical write bytes',
                       'physical read bytes',
                       'physical write total bytes',
                       'redo size',
                       'undo change vector size'))                   
     ORDER BY ss.value DESC               
     /
      
    DECLARE
      var_mybuffer raw(100) := HEXTORAW('7785069247a730153313c7c8244db9228b0ce07a2aa8646cf1bc908bc7a275913c16a0081594f1b3979a515b7c37b61993e17caaccbe1401bd52ca34bf21d7a99b0c8220eae182a0f7ec71db249da8d96a585cf8fa81af6d09129d82afaafd7d60d77e4c');
      var_offset pls_integer;
      var_start_ts timestamp with time zone;
    BEGIN
      var_start_ts := SYSTIMESTAMP;
      
      FOR rec_offset IN (SELECT offset FROM lobtest2 ORDER BY REVERSE(TO_CHAR(offset)) FETCH FIRST 10000 ROWS ONLY)
      LOOP
        UPDATE lobtest2 SET blob_data = var_mybuffer WHERE offset = rec_offset.offset;
      END LOOP;
      
      COMMIT;
      
      dbms_output.put_line(SYSTIMESTAMP - var_start_ts);  
    END;
    / 
    SELECT sn.name,ss.value - NVL(b.value,0) diff
      FROM v$mystat ss,
           v$statname sn,
           stat$before b
     WHERE ss.statistic# = sn.statistic#       
       AND (sn.name IN ('physical writes direct (lob)',
                       'physical write bytes',
                       'physical read bytes',
                       'physical write total bytes',
                       'redo size',
                       'undo change vector size'))  
       AND sn.name = b.name(+)
     ORDER BY diff DESC
     /
    

    Result:

    enter image description here

    We still did 1MB of updates involving 10000 random 100-byte changes, the same as the previous test. The physical read/write metrics are meaningless because these were buffered reads/writes through the cache like normal table operations are. The foreground process doesn't have to do any of its own writes (other than to the redo log), as DBWR will perform that task later when it pleases. Total redo size was 3MB, undo size was 1MB, total time for the test: 0.18 secs.

    Comparing the two tests, we can see that writing to the BLOB must read in the BLOB from disk (non-cached. You can change this with the CACHE setting but a 1GB BLOB is much too big to be cached), so there's a lot of physical direct path read that isn't needed in the RAW(100) scenario, which does no reads at all once everything is in the buffer cache. The BLOB method generated three times as much redo, four times as much undo, and performed an unquantifiable amount of writes (somewhere between 7x and 20x the actual amount of logical change). Just from a pure timing perspective, the RAW(100) method performed 17x faster than the BLOB method.

    So while securefile LOBs have some good performance improvements over the basicfile, they still can't compare with using conventional in-row datatypes. If you need the speed, the concurrency and locking mechanisms of normal row operations, the full benefit of caching, you're better off with a non-LOB approach such as the RAW(100) method suggested. You can write a PL/SQL function to reassemble the rows in the correct order and return a temporary BLOB if your consuming program needs it in that format.

    Obviously this does not address network latency, which you can only reduce by buffering on the client-side so less changes are submitted to the database. But even then, fewer network round-trips are required with an UPDATE of a conventional in-row datatype than anything involving a LOB, since LOB operations require multiple operations per row (create or fetch the locator, open the lob, write, close, etc..). So getting away from a BLOB will also help with your concern about network latency.