cary huang: A Deeper Look Inside PostgreSQL Visibility Check Mechanism

原文英文,约1600词,阅读约需6分钟。发表于:

What is Visibility? Simply put, the visibility refers to whether a row of data (Heap Tuple by default) should be displayed to the user in certain states, backend processes, or transactions. For example, A user changes a data record from ‘A’ to ‘B’ via UPDATE PostgreSQL handles this by INSERTing a new record call ‘B’ while marking record ‘A’ as ‘invisible’ Both records exist in the system, but ‘B’ is visible, while ‘A’ is not. A deleted or invisible record is also referred as a ‘dead’ tuple One of the responsibilities of VACUUM process is to clear these ‘dead’ tuples to free up space. MVCC (Multi-Version Concurrency Control) is a method in which each write operation creates a “new version” of the data while retaining the “old version”. This allows concurrent read and write operations without blocking each other. PostgreSQL uses a variant of MVCC, also called Snapshot Isolation to isolate concurrent transactions. So, it is possible that a single piece of data could have multiple “versions” of it, and it is PostgreSQL’s responsibility to determine which ‘version’ shall be presented to the user based on multiple factors. This act is also known as the “visibility check” or “visibility control” In this blog, we will dive into PostgreSQL’s visibility check mechanism to understand how it works. Visibility Indicators To my knowledge today, PostgreSQL determines a data’s visibility based on the following indicators: Transaction ID xmin xmax cid transaction snapshot CLOG (Commit Log) hintbit what exactly are these? Let’s find out. transaction ID This should be self-explanatory. All transactions in PostgreSQL are associated with a ID number called transaction ID. This is mainly used to check if a data in question is inserted or deleted in the current transaction. We will explore more on this in the later part of the blog. xmin, xmax, cid and hintbit These indicators are categorized together because they are all [...]

可见性是指在某些状态、后端进程或事务中是否应该向用户显示一行数据。PostgreSQL使用MVCC的变体来隔离并发事务。事务快照存储在共享内存中,用于隔离并发事务。了解事务和可见性的基本原理有助于解决数据一致性问题。

cary huang: A Deeper Look Inside PostgreSQL Visibility Check Mechanism
相关推荐 去reddit讨论