postgresqljsonb

How to update huge amount of jsonb's in DB


I have to update >100GB of data of jsonb values in a postgres DB that have the following structure:

{
  "attributes": [
  {
    "attribute": "foobar",
    "rules":[
    {
      "label": "rule1"
    },
    {
      "label": "rule2"
    }
    ]
  },
  {
    "attribute": "foobar2",
    "rules":[
    {
      "label": "rule1"
    },
    {
      "label": "rule2"
    }
    ]
  }
]
}

I have to iterate through all attributes and remove from the rules array the object that has label with value rule2.

I have created a plsql script that iterates through each attribute and removes it with

jsonb_set(attributes, attributes #> attribute_id, attributes #> attribute_id - attribute_rule_id) 

but it is taking too long. How would you approach this problem? Thanks!

EDIT: Here is the script:

do $$
declare
  entry_record RECORD;
  new_jsonb jsonb;
  attributes_arr_length int;
  attribute_object_id int;
  target_rule_id_arr int[];
  rule_id int;

  begin
    for entry_record in (
      select
        mt.id,
        mt.attributes
      from
        my_table mt 
    )
    loop
      new_jsonb := mt.attributes;
      attributes_arr_length = jsonb_array_length(new_jsonb #> ('{attributes}')::text[]);
      continue when attributes_arr_length is null;
      
      for attribute_object_id in 0..(attributes_arr_length -1)
      loop
        select array(select arr.position - 1 from jsonb_array_elements(new_jsonb #> CONCAT('{attributes,', attribute_object_id, ',rules}')::text[])
        with ordinality arr(item_object, position)
        where item_object ->> 'label' = 'rule2'
        order by arr.position desc) into target_rule_id_arr;

        foreach rule_id in array target_rule_id_arr
        loop
          new_jsonb := jsonb_set(new_jsonb, CONCAT('{attributes,', attribute_object_id, ',rules}')::text[],
                                 (new_jsonb #> CONCAT('{attributes,', attribute_object_id, ',rules}')::text[]) - rule_id);
        end loop;

        update my_table
        set attributes = new_jsonb
        where id = entry_record.id;
      end loop;
    end loop;
  end$$;

Solution

  • Don't use an imperative loop but write declarative SQL. Use jsonb_array_elements on the data, filter for the objects that you want, then jsonb_aggregate them back to an array. You'll need two subqueries for the nested structure you have.

    UPDATE my_table
    SET attributes = jsonb_set(attributes, 'attributes', (
      SELECT jsonb_agg(jsonb_set(attribute, 'rules', (
        SELECT jsonb_agg(rule)
        FROM jsonb_array_elements(attribute -> 'rules') r(rule)
        WHERE rule ->> 'label' <> 'rule2'
      ))
      FROM jsonb_array_elements(attributes -> 'attributes') a(attribute)
    ));
    

    You may need a COALESCE(…, '[]') around the jsonb_agg calls if you're dealing with empty arrays, they return NULL otherwise.

    If not all rows in your table need to be updated (i.e. if there's some that don't have any attribute with a rule2), then it'll be more efficient to add a condition to the UPDATE:

    WHERE attributes @> '{"attributes": [{"rules": [{"label": "rule2"}]}]}'