0

I have a table "offer_facts" with many columns including product_dimension_id ( foreign key for product dimension table) and source_name ( varchar ). Both of these columns are indexed . At the moment approximately there are 120K rows in this table. This table is constantly growing ( around 20K per day).

Below is the query and the output I get .

SELECT version() "PostgreSQL 9.4.4 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2, 64-bit"

EXPLAIN (ANALYZE, BUFFERS) SELECT DISTINCT product_dimension_id from offer_facts WHERE source_name='customer_conti' 

And the output is

HashAggregate  (cost=36619.24..36621.37 rows=213 width=4) (actual time=2654.272..2655.064 rows=399 loops=1)
  Group Key: product_dimension_id
  Buffers: shared hit=2697 read=17687
  ->  Seq Scan on offer_facts  (cost=0.00..35425.82 rows=477367 width=4) (actual time=0.021..1525.361 rows=479880 loops=1)
        Filter: ((source_name)::text = 'customer_conti'::text)
        Rows Removed by Filter: 723466
        Buffers: shared hit=2697 read=17687
Planning time: 0.201 ms
Execution time: 2655.778 ms

I am not sure why it is doing Seq Scan and not Index Scan .

I have created index with

CREATE INDEX idx_offer_facts_dimensions ON offer_facts USING btree (source_name COLLATE pg_catalog."default", shop_dimension_id, time_dimension_id, date_dimension_id, source_dimension_id, product_dimension_id);

and I have vaccuumed and analyzed the table.

5
  • Why DISTINCT(product_dimension_id), with ( )'s? Is that some special DISTINCT function, instead of SELECT DISTINCT product_dimension_id? Commented Sep 17, 2015 at 13:18
  • @jarlh no there is nothing special about this. It is a normal distinct function . I think () doesn't matter .. Commented Sep 17, 2015 at 13:26
  • 1
    Unrelated but: distinct is NOT a function. Use distinct product_dimension_id Commented Sep 17, 2015 at 13:26
  • @a_horse_with_no_name removing the brackets didn't effect the query. I will edit the the question to remove the brackets anyways .. Commented Sep 17, 2015 at 13:28
  • 1
    I said it was unrelated. Putting columns in parentheses is not going to do what you think. It doesn't matter for a single column but col1, col2 and (col1, col2) are two different things in Postgres Commented Sep 17, 2015 at 13:40

1 Answer 1

1

This is your query:

SELECT DISTINCT product_dimension_id 
FROM offer_facts
WHERE source_name = 'customer_conti' ;

The optimal index is a composite index: offer_facts(source_name, product_dimension_id). Individual indexes on each column are not as useful. This query can make use of an index scan; Postgres should be smart enough to find that execution path.

Sign up to request clarification or add additional context in comments.

3 Comments

And I'd add that if this is a common use case for the system then it would not hurt to use a materialised view, or other summary logic, to avoid the need to run this query against a growing fact table.
@DavidAldridge I do have the composite index .. CREATE INDEX idx_offer_facts_dimensions ON offer_facts USING btree (source_name COLLATE pg_catalog."default", shop_dimension_id, time_dimension_id, date_dimension_id, source_dimension_id, product_dimension_id); ..
@SucceedStha ... The first two keys in the index need to be source_name and product_dimension_id, in that order.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.