Planet PostgreSQL

Planet PostgreSQL -

Laurenz Albe: Bulk load performance in PostgreSQL

© Laurenz Albe 2023 There are several techniques to bulk load data into PostgreSQL. I decided to compare their performance in a simple test case. I’ll add some recommendations for parameter settings to improve the performance even more. An example table to bulk load data The table is simple enough: CREATE TABLE instest ( id bigint PRIMARY KEY, value text NOT NULL ); It is a narrow table (only two columns), but it has a primary key index. Loading data would be much faster without the index, but in real life you cannot always drop all indexes and constraints before loading data. Moreover, dropping indexes before the bulk load and re-creating them afterwards can be slower, if the table already contains data. For the performance test, I’m going to load 10 million rows that look like (1, '1'), (2, '2') and so on, counting up to 10000000. The test will be performed on PostgreSQL v16 with the default configuration. I am aware that the default configuration is not perfect for bulk loading, but that does not bother me, since I am only interested in comparing the different methods. Different methods for bulk loading I’ll try the following six techniques: Single INSERTs in autocommit mode (the fool’s way) This will be terribly slow, since each statement will run in its own transaction. For each transaction, PostgreSQL has to write the WAL (the transaction log) out to disk, which leads to 10 million I/O requests. Naturally, this is not the correct thing to do for a bulk load. But it is the way a transactional application loads data into the database, with many clients inserting small amounts of data, and no way to bundle the individual requests into bigger transactions. There are some remedies available to boost performance in such a case: Set synchronous_commit to off. That will boost performance amazingly, but an operating system crash could lead to some committed transactions getting lost. Set commit_delay to a value greater than 0 and tune commit_sibling[...]

本文比较了将数据批量加载到PostgreSQL中的几种技术,并提供了一些参数设置的建议来进一步提高性能。作者尝试了六种不同的方法,包括单个INSERT语句、单个事务中的INSERT语句、带有预处理语句的单个事务中的INSERT语句、多行INSERT语句、带有预处理语句的多行INSERT语句以及使用COPY语句进行批量加载。测试结果表明,使用COPY语句是最快的方法。文章还提到了一些调优PostgreSQL以加快批量加载的方法,包括调整max_wal_size和checkpoint_timeout参数。总的来说,对于批量加载来说,COPY是最好的方法。

COPY PostgreSQL performance 参数设置 性能 批量加载

相关推荐 去reddit讨论

热榜 Top10

观测云
观测云
eolink
eolink
LigaAI
LigaAI
Dify.AI
Dify.AI

推荐或自荐