3 tips to manage large Postgres databases

原文英文,约600词,阅读约需3分钟。发表于:

3 tips to manage large Postgres databases elizabeth.chri… Tue, 02/28/2023 - 03:00 The relational database PostgreSQL (also known as Postgres) has grown increasingly popular, and enterprises and public sectors use it across the globe. With this widespread adoption, databases have become larger than ever. At Crunchy Data, we regularly work with databases north of 20TB, and our existing databases continue to grow. My colleague David Christensen and I have gathered some tips about managing a database with huge tables. Big tables Production databases commonly consist of many tables with varying data, sizes, and schemas. It's common to end up with a single huge and unruly database table, far larger than any other table in your database. This table often stores activity logs or time-stamped events and is necessary for your application or users. Really large tables can cause challenges for many reasons, but a common one is locks. Regular maintenance on a table often requires locks, but locks on your large table can take down your application or cause a traffic jam and many headaches. I have a few tips for doing basic maintenance, like adding columns or indexes, while avoiding long-running locks. Adding indexes problem: Index creation locks the table for the duration of the creation process. If you have a massive table, this can take hours. CREATE INDEX ON customers (last_name)Solution: Use the CREATE INDEX CONCURRENTLY feature. This approach splits up index creation into two parts, one with a brief lock to create the index that starts tracking changes immediately but minimizes application blockage, followed by a full build-out of the index, after which queries can start using it. CREATE INDEX CONCURRENTLY ON customers (last_name)Adding columns Adding a column is a common request during the life of a database, but with a huge table, it can be tricky, again, due to locking. Problem: When you add a new column with a default that calls a function, Postgres needs to rewrite the table. For big tables, this can take several hours. Solution: Split up the operation into multiple steps with the total effect of the basic statement, but retain control of the timing of locks. Add the column: ALTER TABLE all_my_exes ADD COLUMN location textAdd the default: ALTER TABLE all_my_exes ALTER COLUMN location SET DEFAULT texas()Use UPDATE to add the default: UPDATE all_my_exes SET location = DEFAULT Adding constraints Problem: You want to add a check constraint for data validation. But if you use the straightforward approach to adding a constraint, it will lock the table while it validates all of the existing data in the table. Also, if there's an error at any point in the validation, it will roll back. ALTER TABLE favorite_bands ADD CONSTRAINT name_check CHECK (name = 'Led Zeppelin') Open source and data science What is data science? What is Python? Data scientist: A day in the life Try OpenShift Data Science MariaDB and MySQL cheat sheet Latest data science articles Solution: Tell Postgres about the constraint but don't validate it. Validate in a second step. This will take a short lock in the first step, ensuring that all new/modified rows will fit the constraint, then validate in a separate pass to confirm all existing data passes the constraint. Tell Postgres about the constraint but do not to enforce it: ALTER TABLE favorite_bands ADD CONSTRAINT name_check CHECK (name = 'Led Zeppelin') NOT VALIDThen VALIDATE it after it's created: ALTER TABLE favorite_bands VALIDATE CONSTRAINT name_check​​Hungry for more? David Christensen and I will be in Pasadena, CA, at SCaLE's Postgres Days, March 9-10. Lots of great folks from the Postgres community will be there too. Join us! Try these handy solutions to common problems when dealing with huge databases. Image by: Internet Archive Book Images. Modified by Opensource.com. CC BY-SA 4.0 Data Science Databases What to read next This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. Register or Login to post a comment.

PostgreSQL数据库越来越受欢迎,Crunchy Data的我们经常处理超过20TB的数据库,David Christensen和我收集了一些关于管理大表的技巧,包括使用CREATE INDEX CONCURRENTLY功能添加索引,分步操作添加列,先告知Postgres再验证添加约束等。David和我将参加SCaLE的Postgres Days,欢迎大家参加!

3 tips to manage large Postgres databases
相关推荐 去reddit讨论