Optimize select query along with where clause to use index scan in postgres

Question

I have a table "offer_facts" with many columns including product_dimension_id ( foreign key for product dimension table) and source_name ( varchar ). Both of these columns are indexed . At the moment approximately there are 120K rows in this table. This table is constantly growing ( around 20K per day).

Below is the query and the output I get .

SELECT version() "PostgreSQL 9.4.4 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2, 64-bit"

EXPLAIN (ANALYZE, BUFFERS) SELECT DISTINCT product_dimension_id from offer_facts WHERE source_name='customer_conti'

And the output is

HashAggregate  (cost=36619.24..36621.37 rows=213 width=4) (actual time=2654.272..2655.064 rows=399 loops=1)
  Group Key: product_dimension_id
  Buffers: shared hit=2697 read=17687
  ->  Seq Scan on offer_facts  (cost=0.00..35425.82 rows=477367 width=4) (actual time=0.021..1525.361 rows=479880 loops=1)
        Filter: ((source_name)::text = 'customer_conti'::text)
        Rows Removed by Filter: 723466
        Buffers: shared hit=2697 read=17687
Planning time: 0.201 ms
Execution time: 2655.778 ms

I am not sure why it is doing Seq Scan and not Index Scan .

I have created index with

CREATE INDEX idx_offer_facts_dimensions ON offer_facts USING btree (source_name COLLATE pg_catalog."default", shop_dimension_id, time_dimension_id, date_dimension_id, source_dimension_id, product_dimension_id);

and I have vaccuumed and analyzed the table.

Why DISTINCT(product_dimension_id), with ( )'s? Is that some special DISTINCT function, instead of SELECT DISTINCT product_dimension_id? — jarlh
– jarlh, Commented Sep 17, 2015 at 13:18
@jarlh no there is nothing special about this. It is a normal distinct function . I think () doesn't matter .. — Succeed Stha
– Succeed Stha, Commented Sep 17, 2015 at 13:26
Unrelated but: distinct is NOT a function. Use distinct product_dimension_id — user330315
– user330315, Commented Sep 17, 2015 at 13:26
@a_horse_with_no_name removing the brackets didn't effect the query. I will edit the the question to remove the brackets anyways .. — Succeed Stha
– Succeed Stha, Commented Sep 17, 2015 at 13:28
I said it was unrelated. Putting columns in parentheses is not going to do what you think. It doesn't matter for a single column but col1, col2 and (col1, col2) are two different things in Postgres — user330315
– user330315, Commented Sep 17, 2015 at 13:40

Gordon Linoff · Accepted Answer · 2015-09-17 13:44:10Z

1

This is your query:

SELECT DISTINCT product_dimension_id 
FROM offer_facts
WHERE source_name = 'customer_conti' ;

The optimal index is a composite index: offer_facts(source_name, product_dimension_id). Individual indexes on each column are not as useful. This query can make use of an index scan; Postgres should be smart enough to find that execution path.

answered Sep 17, 2015 at 13:44

Gordon Linoff

1.3m62 gold badges705 silver badges857 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

David Aldridge Over a year ago

And I'd add that if this is a common use case for the system then it would not hurt to use a materialised view, or other summary logic, to avoid the need to run this query against a growing fact table.

Succeed Stha Over a year ago

@DavidAldridge I do have the composite index .. CREATE INDEX idx_offer_facts_dimensions ON offer_facts USING btree (source_name COLLATE pg_catalog."default", shop_dimension_id, time_dimension_id, date_dimension_id, source_dimension_id, product_dimension_id); ..

Gordon Linoff Over a year ago

@SucceedStha ... The first two keys in the index need to be source_name and product_dimension_id, in that order.

Collectives™ on Stack Overflow

Optimize select query along with where clause to use index scan in postgres

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related