1

I have 2 tables, one table is 150M records, the other is 1 million. The subset of rows I am asking is only about 400 rows. Despite that I am getting "full scan" in explain analyze. The query is :

with txs as (
  select 
       tx_id 
  from pol_tok_id_ops 
  where condition_id='e3b423dfad8c22ff75c9899c4e8176f628cf4ad4caa00481764d320e7415f7a9'
) 
select 
       e20t.time_stamp,e20t.amount 
from erc20_transf e20t 
cross join txs 
where txs.tx_id=e20t.tx_id;

Explain reports;

 Merge Join  (cost=38689523.80..103962768.64 rows=4325899145 width=40)
   Merge Cond: (txs.tx_id = e20t.tx_id)
   CTE txs
     ->  Bitmap Heap Scan on pol_tok_id_ops  (cost=316.12..15969.27 rows=5622 width=8)
           Recheck Cond: (condition_id = 'e3b423dfad8c22ff75c9899c4e8176f628cf4ad4caa00481764d320e7415f7a9'::text)
           ->  Bitmap Index Scan on pol_tokidops_cond_idx  (cost=0.00..314.72 rows=5622 width=0)
                 Index Cond: (condition_id = 'e3b423dfad8c22ff75c9899c4e8176f628cf4ad4caa00481764d320e7415f7a9'::text)
   ->  Sort  (cost=462.60..476.66 rows=5622 width=8)
         Sort Key: txs.tx_id
         ->  CTE Scan on txs  (cost=0.00..112.44 rows=5622 width=8)
   ->  Materialize  (cost=38673091.92..39442551.04 rows=153891823 width=48)
         ->  Sort  (cost=38673091.92..39057821.48 rows=153891823 width=48)
               Sort Key: e20t.tx_id
               ->  Seq Scan on erc20_transf e20t  (cost=0.00..3543917.23 rows=153891823 width=48)

Tables are defined as:

\d erc20_transf
;                                        Table "public.erc20_transf"
    Column    |           Type           | Collation | Nullable |                 Default                  
--------------+--------------------------+-----------+----------+------------------------------------------
 id           | bigint                   |           | not null | nextval('erc20_transf_id_seq'::regclass)
 evtlog_id    | bigint                   |           | not null | 
 block_num    | bigint                   |           | not null | 
 time_stamp   | timestamp with time zone |           |          | 
 tx_id        | bigint                   |           | not null | 
 contract_aid | bigint                   |           | not null | 
 from_aid     | bigint                   |           |          | 0
 to_aid       | bigint                   |           |          | 0
 amount       | numeric                  |           |          | 0.0
Indexes:
    "erc20_transf_pkey" PRIMARY KEY, btree (id)
    "erc20_transf_evtlog_id_key" UNIQUE CONSTRAINT, btree (evtlog_id)
    "erc20_tr_ctrct_idx" btree (contract_aid)
    "erc20_transf_from_idx" btree (from_aid)
    "erc20_transf_to_idx" btree (to_aid)
    "erc20_tx_idx" btree (tx_id)
Foreign-key constraints:
    "erc20_transf_evtlog_id_fkey" FOREIGN KEY (evtlog_id) REFERENCES evt_log(id) ON DELETE CASCADE
Referenced by:
    TABLE "erc20_bal" CONSTRAINT "erc20_bal_parent_id_fkey" FOREIGN KEY (parent_id) REFERENCES erc20_transf(id) ON DELETE CASCADE
Triggers:
    erc20_transf_delete AFTER DELETE ON erc20_transf FOR EACH ROW EXECUTE PROCEDURE on_erc20_transf_delete()
    erc20_transf_insert AFTER INSERT ON erc20_transf FOR EACH ROW EXECUTE PROCEDURE on_erc20_transf_insert()


 \d pol_tok_id_ops
                                 Table "public.pol_tok_id_ops"
      Column      |  Type   | Collation | Nullable |                  Default                   
------------------+---------+-----------+----------+--------------------------------------------
 id               | bigint  |           | not null | nextval('pol_tok_id_ops_id_seq'::regclass)
 evtlog_id        | bigint  |           |          | 
 tx_id            | bigint  |           |          | 
 parent_split_id  | bigint  |           |          | 
 parent_merge_id  | bigint  |           |          | 
 parent_redeem_id | bigint  |           |          | 
 contract_aid     | bigint  |           |          | 
 outcome_idx      | integer |           | not null | 
 condition_id     | text    |           | not null | 
 token_id_hex     | text    |           | not null | 
 token_from       | text    |           | not null | 
 token_to         | text    |           | not null | 
 token_amount     | numeric |           | not null | 
Indexes:
    "pol_tok_id_ops_pkey" PRIMARY KEY, btree (id)
    "pol_tokidops_cond_idx" btree (condition_id)
    "pol_tokidops_tx_idx" btree (tx_id)
Foreign-key constraints:
    "pol_tok_id_ops_parent_merge_id_fkey" FOREIGN KEY (parent_merge_id) REFERENCES pol_pos_merge(id) ON DELETE CASCADE
    "pol_tok_id_ops_parent_redeem_id_fkey" FOREIGN KEY (parent_redeem_id) REFERENCES pol_pos_merge(id) ON DELETE CASCADE
    "pol_tok_id_ops_parent_split_id_fkey" FOREIGN KEY (parent_split_id) REFERENCES pol_pos_split(id) ON DELETE CASCADE

So, why it is doing full scan on 153 million records when CTE scan reports only 5622 rows??? Or, maybe there is a way to force scan on an index somehow?

I am using Postgres 10.12 (can't upgrade, its a production machine)

EDIT:

Some benchmarks:

psql=> explain analyze select * from erc20_transf WHERE tx_id  in (1111111,2222222,333333,555555,666666);
                                                          QUERY PLAN                                                           
-------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on erc20_transf  (cost=71721.16..2161950.95 rows=3852686 width=96) (actual time=1.420..1.482 rows=3 loops=1)
   Recheck Cond: (tx_id = ANY ('{1111111,2222222,333333,555555,666666}'::bigint[]))
   Heap Blocks: exact=3
   ->  Bitmap Index Scan on erc20_tx_idx  (cost=0.00..70757.99 rows=3852686 width=0) (actual time=1.386..1.386 rows=3 loops=1)
         Index Cond: (tx_id = ANY ('{1111111,2222222,333333,555555,666666}'::bigint[]))
 Planning time: 1.490 ms
 Execution time: 1.557 ms
(7 rows)

psql=> explain analyze select tx_id from pol_tok_id_ops where condition_id='e3b423dfad8c22ff75c9899c4e8176f628cf4ad4caa00481764d320e7415f7a9';
                                                             QUERY PLAN                                                              
-------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on pol_tok_id_ops  (cost=431.34..21552.04 rows=7586 width=8) (actual time=1.431..5.851 rows=340 loops=1)
   Recheck Cond: (condition_id = 'e3b423dfad8c22ff75c9899c4e8176f628cf4ad4caa00481764d320e7415f7a9'::text)
   Heap Blocks: exact=133
   ->  Bitmap Index Scan on pol_tokidops_cond_idx  (cost=0.00..429.45 rows=7586 width=0) (actual time=1.355..1.355 rows=340 loops=1)
         Index Cond: (condition_id = 'e3b423dfad8c22ff75c9899c4e8176f628cf4ad4caa00481764d320e7415f7a9'::text)
 Planning time: 1.585 ms
 Execution time: 5.926 ms
(7 rows)

psql=> 
14
  • have you done a recent vacuum analyze on both tables? Commented Sep 2, 2021 at 23:22
  • @melcher no, but these tables are insert only, if there is a DELETE , it will be once per day event, if ever. Commented Sep 2, 2021 at 23:23
  • Also can you see if erc20_tx_idx is usable? (e.g. what's the explain of select * from erc20_transf WHERE tx_id = ? Commented Sep 2, 2021 at 23:24
  • Or see if you have any invalid indexes w/ SELECT * FROM pg_class, pg_index WHERE pg_index.indisvalid = false AND pg_index.indexrelid = pg_class.oid;, medium.com/carwow-product-engineering/… Commented Sep 2, 2021 at 23:25
  • then just analyze, updated statistics are still essential for the query planner Commented Sep 2, 2021 at 23:26

2 Answers 2

1

Run vacuum analyze [table-name] (or just analyze [table-name]) to update the query planner statistics.

For large or frequently updated tables, PostgreSQL recommends tuning the auto-analyze (and generally auto-vacuum) settings to fit your needs. If postgres is using out of date table statistics (that don't reflect the current content/distribution of the table) it can come up with bad to terrible query plans.

Also, for unevenly distributed data on a large table (where the sampling techniques postgres uses won't always accurately reflect the underlying data, depending on the sample) you may need to tweak the target statistics. See Analyzing Extreme Distributions in Postgres and Analyze Strategy for big tables in Postgres

Sign up to request clarification or add additional context in comments.

Comments

0

Perhaps. It would seem to be the JOIN. You have

    ... 
    select 
           e20t.time_stamp,e20t.amount 
    from erc20_transf e20t 
    cross join txs 
    where txs.tx_id=e20t.tx_id;

JOINS typically are performed before WHERE and cross join will combine every row erc20_transf with eveny row txs, then apply the WHERE. Try an INNER JOIN:

    ... 
    select 
           e20t.time_stamp,e20t.amount 
    from erc20_transf e20t 
    join txs on (txs.tx_id=e20t.tx_id);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.