Improve Case-Insensitive Queries in PostgreSQL Using Smarter Indexes

February 2015 · PostgreSQL, Indexes, SQL, Performance

One of the unique features of Postgres is it’s powerful index engine. Using Postgres you can get much more fine grained control over how your data is indexed. For example indexes on expressions which allow you to create indexes on fields that have UPPER or LOWER functions applied to them.

Typically a VARCHAR field is case-sensitive in Postgres. Meaning that fieldname = 'FOO' and fieldname = 'foo' won’t match when the actual value is “FOO”. A common technique is to normalize the field value like lower(fieldname) = lower('FOO') to coerce the fields to match using their lower-case form. The problem is that this will ignore indexes and cause Postgres to use a Sequential Scan to find the match evaluating each row looking for a match. This is expensive and non-ideal for large and frequently queried tables.

explain select username from users where lower(username) = lower('just3ws');
                        QUERY PLAN
-----------------------------------------------------------
 Seq Scan on users  (cost=0.00..13327.36 rows=558 width=9)
   Filter: (lower((username)::text) = 'just3ws'::text)
(2 rows)

Compare that to the query with no expression and a simple index on the username field.

explain select username from users where username = lower('just3ws');
                                     QUERY PLAN
-------------------------------------------------------------------------------------
 Index Scan using index_users_on_username on users  (cost=0.00..4.07 rows=1 width=9)
   Index Cond: ((username)::text = 'just3ws'::text)
(2 rows)

There’s a significant cost to pay when querying against expressions. But all is not lost. Postgres will let you build indexes against expressions.

In this case we’ll apply a index that will take the lower() expression into account.

create index ix_users_username_lower on users (lower(username) varchar_pattern_ops);

Now when we query for lower(usernames) we’ll be able to take advantage of the index to avoid full table scanning.

explain select username from users where lower(username) = lower('just3ws');
                                       QUERY PLAN
----------------------------------------------------------------------------------------
 Bitmap Heap Scan on users  (cost=4.92..988.43 rows=558 width=9)
   Recheck Cond: (lower((username)::text) = 'just3ws'::text)
   ->  Bitmap Index Scan on ix_users_username_lower  (cost=0.00..4.90 rows=558 width=0)
         Index Cond: (lower((username)::text) = 'just3ws'::text)
(4 rows)

It’s much better than the Sequential Scan now although still a little bit slower than just querying against the simple index.

The Bitmap Heap Scan is useful when there is a lot of variety in the data and Postgres is able to intelligently segregate the data. Basically the data is chunked into smaller sets that Postgres can filter though and more intelligently decide which sets to scan and which to skip. Where the Sequential Scan will touch every single row in the table.

Read more about the Postgres Index on Expressions in the official documentation and this explanation by the author of the Postgres Bitmap Heap Scan algorithm Tom Lane.

For a full explanation of what the cost, rows, and width values mean please check out Using Explain from the Postgres documentation.

varchar_pattern_ops is explained in Postgres Index documentation.