I have to update >100GB of data of jsonb values in a postgres DB that have the following structure:
{
"attributes": [
{
"attribute": "foobar",
"rules":[
{
"label": "rule1"
},
{
"label": "rule2"
}
]
},
{
"attribute": "foobar2",
"rules":[
{
"label": "rule1"
},
{
"label": "rule2"
}
]
}
]
}
I have to iterate through all attributes and remove from the rules array the object that has label with value rule2.
I have created a plsql script that iterates through each attribute and removes it with
jsonb_set(attributes, attributes #> attribute_id, attributes #> attribute_id - attribute_rule_id)
but it is taking too long. How would you approach this problem? Thanks!
EDIT: Here is the script:
do $$
declare
entry_record RECORD;
new_jsonb jsonb;
attributes_arr_length int;
attribute_object_id int;
target_rule_id_arr int[];
rule_id int;
begin
for entry_record in (
select
mt.id,
mt.attributes
from
my_table mt
)
loop
new_jsonb := mt.attributes;
attributes_arr_length = jsonb_array_length(new_jsonb #> ('{attributes}')::text[]);
continue when attributes_arr_length is null;
for attribute_object_id in 0..(attributes_arr_length -1)
loop
select array(select arr.position - 1 from jsonb_array_elements(new_jsonb #> CONCAT('{attributes,', attribute_object_id, ',rules}')::text[])
with ordinality arr(item_object, position)
where item_object ->> 'label' = 'rule2'
order by arr.position desc) into target_rule_id_arr;
foreach rule_id in array target_rule_id_arr
loop
new_jsonb := jsonb_set(new_jsonb, CONCAT('{attributes,', attribute_object_id, ',rules}')::text[],
(new_jsonb #> CONCAT('{attributes,', attribute_object_id, ',rules}')::text[]) - rule_id);
end loop;
update my_table
set attributes = new_jsonb
where id = entry_record.id;
end loop;
end loop;
end$$;
Don't use an imperative loop but write declarative SQL. Use jsonb_array_elements
on the data, filter for the objects that you want, then jsonb_agg
regate them back to an array. You'll need two subqueries for the nested structure you have.
UPDATE my_table
SET attributes = jsonb_set(attributes, 'attributes', (
SELECT jsonb_agg(jsonb_set(attribute, 'rules', (
SELECT jsonb_agg(rule)
FROM jsonb_array_elements(attribute -> 'rules') r(rule)
WHERE rule ->> 'label' <> 'rule2'
))
FROM jsonb_array_elements(attributes -> 'attributes') a(attribute)
));
You may need a COALESCE(…, '[]')
around the jsonb_agg
calls if you're dealing with empty arrays, they return NULL
otherwise.
If not all rows in your table need to be updated (i.e. if there's some that don't have any attribute with a rule2), then it'll be more efficient to add a condition to the UPDATE
:
WHERE attributes @> '{"attributes": [{"rules": [{"label": "rule2"}]}]}'