Laurenz Albe: PostgreSQL hash index performance

© Laurenz Albe 2023 Among the many index types in PostgreSQL, the hash index is the most widely ignored. This came home to me when somebody asked me a question about hash index performance recently. High time to explore that little-known corner of PostgreSQL and run some benchmarks! The history of the hash index PostgreSQL has had hash indexes since the dawn of time (I looked at the source of version 4.2), but they were not crash safe before v10. Consequently, you could not really use them in older versions. For that very reason, hash indexes have received little attention by the PostgreSQL developers. But even after they became first-class citizens, they have led a niche existence. Time to call them on stage and let them show what they can do. Hash index implementation details You can find a detailed description in the hash README file. A hash index has two or more bucket pages that store the result of a system-defined hash function of the indexed values. Whenever the index exceeds a certain average number of entries per page that depends on the fillfactor, PostgreSQL doubles the number of bucket pages by splitting each of them. For large indexes, PostgreSQL performs this doubling in four batches to spread the work across several DML operations. Consequently, an INSERT into a table with a hash index will occasionally be unpleasantly slow, similar to the effect of cleaning up the pending list in a GIN index. If a hash does not fit in the bucket page where it belongs (but it is not yet time to double the index), PostgreSQL puts it into an overflow pages that it creates for that purpose. This should not happen frequently, unless some of the the indexed values occur very often. Potential use cases for a hash index Hash indexes can only have a single column, they only support equality search, and they cannot enforce uniqueness. So they really cannot do anything that B-tree indexes cannot do just as well, with one exception: while the length limit for entries in a B-tree index is [...]

PostgreSQL的哈希索引在某些情况下比B树索引更高效，适用于SELECT和UPDATE频繁的工作负载。它可以减少大型表的等值扫描的索引访问时间，特别在插入和查询长文本列时表现出色。然而，在常见情况下，B树索引通常优于哈希索引。哈希索引的性能可能会有所改善，如果得到更多关注的话。

观测云	eolink
Dify.AI	LigaAI

Laurenz Albe: PostgreSQL hash index performance

验证