Jeremy Schneider: Postgres Indexes, Partitioning and LWLock:LockManager Scalability

原文英文,约3900词,阅读约需14分钟。发表于:

I have decided that – in Postgres circles – I shall henceforth refer to 2023 as THE YEAR OF THE LOCK MANAGER’S REVENGE. Let me explain. Lets start with Bruce Momjian. He has an in-depth presentation about locking in general with PostgreSQL called “Unlocking the Postgres Lock Manager“. I see online that he’s been giving this talk at least as far back as the Postgres Open 2011 conference and the slide deck says it was last updated in Feb 2023. There is a video recording available (Bruce’s site above has the YouTube link). Bruce’s talk is not about the wait event. The wait event represents an in-memory 16-partition (aka tranche) “lightweight lock” that protects the Lock Manager during concurrent access. Bruce’s talk is important because it describes what the Lock Manager does… and that’s the thing which this lightweight tranche lock is protecting. Otherwise, concurrent memory access would cause corruptions. So it gives a beginning sense what factors might lead to contention on this in-memory lightweight tranche lock. The Lock Manager system is in charge of managing heavyweight or sql/application explicit locks like table locks, row locks, etc. The work “lock” means a bunch of things which is really confusing in this context! In 2017, just before the Postgres version of Aurora was launched, I joined RDS and plunged myself into the PostgreSQL world. Having been interested in Oracle performance for many years (but not an expert by any means), one of the first areas I dove into was PostgreSQL Wait Events. This was brand new and hot off the press at that time. Meanwhile, Kyle Hailey had been working with the RDS Performance Insights team and brought years of industry experience building DB performance tools. As a result, RDS PostgreSQL and Aurora PostgreSQL provided exceptional visibility into database performance very early on. Aurora even backported the v10 Postgres wait code into it’s v9.6-compatible launch product, so that it could have the Average Active Sessions dashboard with query drilldown[...]

PostgreSQL用户在Lock Manager方面遇到了争用问题,导致性能下降。这个问题已经被GitLab和Midjourney等多家公司观察到。问题与轻量级锁有关,高事务率和分区数量等因素加剧了问题。缓解策略包括增加容量,应用层缓存,删除不必要的索引,并确保对分区表进行适当的查询优化。尽管出现了这些问题,PostgreSQL仍然是一种受欢迎且可扩展的数据库,正在进行改进Lock Manager可扩展性的讨论。

Jeremy Schneider: Postgres Indexes, Partitioning and LWLock:LockManager Scalability
相关推荐 去reddit讨论