0

I have a table, let's call it my_table with the following structure

ID   | data
____________
uuid | jsonb

The data in the jsonb field is an object structured in the following way:

{
  "A":[
    {"index":"1", "index2":"50"},
    {"index":"2"},
    {"index":"3"},
    {"index":"4", "index2":"10"},
  ],
  "B": "something",
  "C": "something_else",
}

I want to add an "index2" key to each object within "A" with a value of null, but only if that key does not already exist.

This is the result I want:

{
  "A":[
    {"index":"1", "index2":"50"},
    {"index":"2", "index2":null},
    {"index":"3", "index2":null},
    {"index":"4", "index2":"10"},
  ],
  "B": "something",
  "C": "something_else",
}

I have the following query that works but is extraordinarily slow. After running EXPLAIN on my query, it shows that for every row, it is scanning the entire table (essentially n^2 complexity).

Here's the current query:

UPDATE my_table
SET data = JSONB_SET(my_table.data, '{A}', (
    SELECT JSONB_AGG( element || '{"index2": null}' )
    FROM JSONB_ARRAY_ELEMENTS(my_table.data -> 'A') element
))
FROM my_table as my_table_2, JSONB_ARRAY_ELEMENTS(
    my_table_2.data -> 'A'
) as element
WHERE jsonb_array_length(my_table.data -> 'A') > 0
AND element#>'{index2}' IS NULL;

How can I speed this up to get to a linear run time?

1 Answer 1

1

You don't need the FROM which causes a CROSS JOIN and the quadratic behavior.

I simulated it locally and after simplification, this works fast:

explain analyse
UPDATE my_table
SET data = JSONB_SET(data, '{A}',
                     (SELECT JSONB_AGG('{"index2": null}' || element)
                      FROM JSONB_ARRAY_ELEMENTS(data -> 'A') element))
WHERE data @? '$.A[*] ? (!exists(@.index2))';

With a linear plan:

Update on my_table  (cost=0.00..31308.89 rows=8801 width=42) (actual time=1156.675..1156.677 rows=0 loops=1)
  ->  Seq Scan on my_table  (cost=0.00..31308.89 rows=8801 width=42) (actual time=0.128..865.131 rows=100000 loops=1)
"        Filter: (data @? '$.""A""[*]?(!(exists (@.""index2"")))'::jsonpath)"
        Rows Removed by Filter: 100000
        SubPlan 1
          ->  Aggregate  (cost=1.51..1.52 rows=1 width=32) (actual time=0.005..0.005 rows=1 loops=100000)
                ->  Function Scan on jsonb_array_elements element  (cost=0.01..1.00 rows=100 width=32) (actual time=0.001..0.001 rows=4 loops=100000)
Planning Time: 0.268 ms
Execution Time: 1156.836 ms

As you can see I tested it with 200000 rows out of which half needed an update.

Note that you had the operands of || switched. Also, my solution works with Postgres 12+, version 11 did not support jsonpath. Which version do you use?


Reply to comment about Postgres 10:

For older versions, I did not manage to express the filter directly in the condition. It looks like you can define a function instead:

create or replace function misses_index2(data jsonb) returns boolean
    language sql
as
$$
select count(*) filter ( where element #> '{index2}' IS NULL ) > 0
from JSONB_ARRAY_ELEMENTS(data -> 'A') element
$$;

(Tested on Postgres 13 but should work even on Postgres 10.)

Then the where condition is simply:

WHERE misses_index2(data);

The query is about twice as slow but that is expected.

Update on my_table  (cost=0.00..161007.00 rows=66990 width=42) (actual time=2413.235..2413.236 rows=0 loops=1)
  ->  Seq Scan on my_table  (cost=0.00..161007.00 rows=66990 width=42) (actual time=0.351..2123.757 rows=100000 loops=1)
        Filter: misses_index2(data)
        Rows Removed by Filter: 100000
        SubPlan 1
          ->  Aggregate  (cost=1.51..1.52 rows=1 width=32) (actual time=0.005..0.005 rows=1 loops=100000)
                ->  Function Scan on jsonb_array_elements element  (cost=0.01..1.00 rows=100 width=32) (actual time=0.001..0.001 rows=4 loops=100000)
Planning Time: 0.303 ms
Execution Time: 2413.347 ms
Sign up to request clarification or add additional context in comments.

2 Comments

Ahhh this makes sense. Unfortunately I'm on postgres 10. Do you know if there's a solution that works with that version?
Edited with an idea of using a function in the where condition.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.