postgresql not using trigram index on text column but uses it on varchar column

Question

So basically I set up a very simple test table to test trigram and fulltext indexing capabilities in postgresql 9.1 (stock Debian stable).

Here are the table and index definitions:

-- Table: fulltextproba
-- DROP TABLE fulltextproba;
CREATE TABLE fulltextproba
(
  id integer NOT NULL,
  text text,
  varchar600 character varying(600) COLLATE pg_catalog."C.UTF-8",
  CONSTRAINT id PRIMARY KEY (id )
)
WITH (
  OIDS=FALSE
);

-- Index: id_index
-- DROP INDEX id_index;
CREATE UNIQUE INDEX id_index
  ON fulltextproba
  USING btree
  (id );

-- Index: text_gin_fulltext_hun
-- DROP INDEX text_gin_fulltext_hun;
CREATE INDEX text_gin_fulltext_hun
  ON fulltextproba
  USING gin
  (to_tsvector('hungarian'::text, text) );

-- Index: text_gin_trgm
-- DROP INDEX text_gin_trgm;
CREATE INDEX text_gin_trgm
  ON fulltextproba
  USING gin
  (text COLLATE pg_catalog."C.UTF-8" gin_trgm_ops);

-- Index: varchar600
-- DROP INDEX varchar600;
CREATE INDEX varchar600
  ON fulltextproba
  USING btree
  (varchar600 COLLATE pg_catalog."C.UTF-8" varchar_pattern_ops);

-- Index: varchar600_gin_trgm
-- DROP INDEX varchar600_gin_trgm;
CREATE INDEX varchar600_gin_trgm
  ON fulltextproba
  USING gin
  (varchar600 COLLATE pg_catalog."C.UTF-8" gin_trgm_ops);

My problem is, if I do a %foo% search that should use the trigram index, If I search on the text column, it doesn't:

SELECT COUNT(id) FROM public.fulltextproba WHERE text LIKE '%almáv%'
 count 
-------
   396
(1 row)

real    0m7.215s
user    0m0.020s
sys 0m0.004s
                                QUERY PLAN                                 
---------------------------------------------------------------------------
 Aggregate  (cost=657056.11..657056.12 rows=1 width=4)
   ->  Seq Scan on fulltextproba  (cost=0.00..657052.72 rows=1355 width=4)
         Filter: (text ~~ '%almáv%'::text)
(3 rows)

But if I search in the varchar600 column, it does use the trigram index, and is - not suprisingly - much faster:

SELECT COUNT(id) FROM public.fulltextproba WHERE varchar600 LIKE '%almáv%'
 count 
-------
   373
(1 row)

real    0m0.184s
user    0m0.052s
sys 0m0.004s
                                         QUERY PLAN                                         
--------------------------------------------------------------------------------------------
 Aggregate  (cost=5283.11..5283.12 rows=1 width=4)
   ->  Bitmap Heap Scan on fulltextproba  (cost=62.50..5279.73 rows=1355 width=4)
         Recheck Cond: ((varchar600)::text ~~ '%almáv%'::text)
         ->  Bitmap Index Scan on varchar600_gin_trgm  (cost=0.00..62.16 rows=1355 width=0)
               Index Cond: ((varchar600)::text ~~ '%almáv%'::text)
(5 rows)

So the ultimate questions are:

Why postgres doesn't use the trigram index on the text column.
How can postgres be made to use the index? Should I define it some other way?

Community · Accepted Answer · 2017-05-23 10:29:44Z

3

text is perfectly fine. The best option even, as you can see in the EXPLAIN output:

Index Cond: ((varchar600)::text ~~ '%almáv%'::text)

Collation mismatch

The immediate reason is probably a collation mismatch. Your table is defined:

text text,   -- default collation is ???
varchar600 character varying(600) COLLATE pg_catalog."C.UTF-8"

While both indexes use COLLATE pg_catalog."C.UTF-8". What is your default collation? The output from:

SHOW LC_COLLATE;

You may be mixing different collations. Retest with:

SELECT COUNT(id) FROM public.fulltextproba
WHERE text COLLATE pg_catalog."C.UTF-8" LIKE '%almáv%'

Read about collation support in Postgres.

General problems in your test

You obviously have different values in either of the columns. Repeat the test with identical values.

To force Postgres into using the index, you can (for debugging in your session only!):

SET enable_seqscan = off;

Then try again. Details:

Left Join Lateral and array aggregates

Outlook for GIN index in Postgres 9.4

The upcoming Postgres 9.4 is shipping with a number of substantial improvements for GIN indexes. In particular, they are going to be much smaller and faster.

edited May 23, 2017 at 10:29

CommunityBot

11 silver badge

answered Aug 19, 2014 at 17:27

Erwin Brandstetter

668k159 gold badges1.2k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

P.Péter Over a year ago

Thank you, you were right! My locale is hu_HU.UTF-8, and by specifying the collation, the trigram index has begun to be used right away (my reason for using C.UTF-8 was that this column contains multilingual text).

P.Péter Over a year ago

As for the size of the table, you are wrong: the table contains 14155098 (~14M) rows, so the index scan is rightly faster.

P.Péter Over a year ago

To the other point: I know there are different values in each column (the varchar600 column is truncated to 600 chars; it only matters for about 10% of the rows), so I have no problem with the difference in the result. I am not testing whether postgres conveys accurate or consistent results (I am assuming that, I may be wrong :) ), I am trying to assess general performance for some basic queries in a certain environment to test the feasibility of using postgres. So far it fares well. :)

P.Péter Over a year ago

Also, changing the default collation for test made the original query use the index. Setting the collation took two hours... :-P Moral: Choose your collations wisely, and in advance!

Erwin Brandstetter Over a year ago

@P.Péter: I mistook the rowcount for the hits as the number of rows in the ANALYZE output. Removed that part. Still confused though, why we see rows=1355, but get a much smaller number from count(). I also added an outlook for GIN indexes in Postgres 9.4, which you may like.

|

Collectives™ on Stack Overflow

postgresql not using trigram index on text column but uses it on varchar column

1 Answer 1

Collation mismatch

General problems in your test

Outlook for GIN index in Postgres 9.4

6 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Collation mismatch

General problems in your test

Outlook for GIN index in Postgres 9.4

6 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related