1

I have a large table with a jsonb column. one of the fields in the jsonb is an array of objects. I need to get a unique list of values from within the array. Here is a simplified equivalent example.

create table objects(
  id serial4 not null,
  object_data jsonb not null
  );

say it has entries like below

insert into objects (object_data) values 
(  '{"id": 1,  "color": "green", "alias" : [{"name":"abc", "order":"1"}, {"name":"abcd", "order":"2"}, {"name":"bbc", "order":"3"}]}'::jsonb),
(  '{"id": 2,  "color": "blue", "alias" : [{"name":"bbc", "order":"1"}, {"name":"abcd", "order":"2"}, {"name":"bbcnn", "order":"3"}]}'::jsonb)
;

I want to get a unique list of values for name in the alias array. This query below works but its slow.

SELECT DISTINCT jsonb_array_elements(object_data -> 'alias')->>'name'
FROM objects ;

There are only about 20 unique values for name and the table has around 350k entries. we are using postgres 12. Is there any indexing option that can speed up the query. I tried btree and gin index on object_data -> 'alias' but the query always does a seq scan.

2
  • 1
    This is a classic example of misuse of jsonb type. You should store the aliases in a regular table. Commented Jan 31, 2022 at 23:11
  • The query requires to scan the full table because there is no WHERE clause. In this case a seq scan will be faster than an index scan, so the planner selects the seq scan. Commented Feb 1, 2022 at 6:14

1 Answer 1

1

Classical indexing is not going to help you here as you need to inspect the entire table. Something like a materialized view would be more suitable.

But you could write some C code to inspect the structure of a GIN index to return just the distinct values stored in it. But indexes don't have visibility information, so you would probably find values that no longer have any active rows and would then need to filter those out (which the GIN index should at least make efficient). I am not aware of any existing code that does this, and this would not be a trivial exercise. You would also have to deal with the consequences of the fastupdate option for the index, if that is turned on.

Maybe this would fit in as an addition to the pageinspect extension.

Sign up to request clarification or add additional context in comments.

1 Comment

thanks, a materialized view that gets updated as needed might be the simpler option.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.