标签

 postgresql 

相关的文章:

Planet PostgreSQL -

Ahsan Hadi: Embedding near the edge: pgEdge Distributed PostgreSQL with pgVector

Introduction We are excited to be announcing that we now support the increasingly popular pgVector Postgres extension for storing and searching vector embeddings in AI-powered applications. Bringing pgVector and pgEdge’s distributed capabilities together makes for a powerful combination that greatly improves performance for users regardless of their geographic location.In this blog we'll demonstrate how to configure pgVector with pgEdge to provide similarity search functionality across a pgEdge Distributed PostgreSQL cluster. I will start with brief summary of the products mentioned in the title of this blog:      pgEdge is fully-distributed PostgreSQL, optimized for the network edge and deployable across multiple cloud regions or data centers. pgEdge is available as pgEdge Platform, self-hosted software available for download from [download link]; or as pgEdge Cloud, a fully managed service.  This blog is applicable to both pgEdge Cloud and pdEdge Platform.pgvector is an open source extension for PostgreSQL that enables efficient similarity search and other vector-based operations. It's often used for applications like recommendation systems and image search. The pgvector extension provides an indexable vector data type that stores vectors in a PostgreSQL database. pgvector supports the  index, which implements the  method of indexing. Vector Database Vector data stores data as high-dimensional vectors, which are mathematical representations of features or attributes. The number of dimensions in a vector ranges from tens to thousands, depending on the complexity and granularity of the data. The main advantage of a vector database is that it allows for fast and accurate similarity search and retrieval of data based on their vector distance or similarity. So instead of using the conventional methods for searching data using predefined criteria or exact matches or wildcards, one can use the vector database to find similar or relevant data based on semantic or contextual meaning.Vector databases enable accurat[...]

AI生成摘要 本文介绍了使用PostgreSQL的pgvector扩展和OpenAI嵌入式生成功能实现相似性搜索的方法,并提供了Python代码示例。应用程序可以回答客户查询并使用OpenAI模型生成答案。此外,文章还提到了PGDay UK 2023的时间和地点。

相关推荐 去reddit讨论

Percona Database Performance Blog -

Percona Distribution for PostgreSQL 16 Is Now Available

PostgreSQL Community released PostgreSQL 16 on September 14, 2023.  In years past, we’ve released our Distribution for PostgreSQL a few months later.  We wanted to improve in this regard and establish a new release baseline. Improving quality while maintaining the same resilience towards QA was something we aimed for. It looks like we succeeded, and […]

AI生成摘要 Percona has released its distribution for PostgreSQL 16, shortly after the community released PostgreSQL 16. The release includes new features such as native support for logical replication from standby servers, parallelization for transactions, and improved vacuum freezing performance. Percona is also working on closing gaps for data at rest encryption and plans to make Percona images for PostgreSQL available on Docker Hub soon. They hope to involve the community in discussing the best approaches for transparent data encryption in PostgreSQL.

相关推荐 去reddit讨论

Planet PostgreSQL -

Hans-Juergen Schoenig: Citus: Row store vs. column store in PostgreSQL

USING columnar Row store vs. column store – a lot has been written about this topic in the context of PostgreSQL and Citus. What does it really mean and what are the implications? Are column stores “always cool” and “always beneficial”? No, there’s more to it – which requires a closer look. When trying to understand the benefits of column store over row store or vice versa, it’s important to grasp the basic ideas which lead to the database behavior we’re going to see in this blog. What does “row store” vs. “column store” mean? Let’s use a practical example to understand. Consider the following: SELECT first_name, last_name FROM person WHERE id = 10; What will happen behind the scenes: PostgreSQL will not just look for 3 columns (first_name, last_name and id). It will instead communicate with the disk system using 8k blocks. Inside a single block, PostgreSQL stores more than just one row including visibility and so on. In a classical OLTP application (bookkeeping, financial transactions, address search, etc.) we usually need the entire row. Just imagine you are searching for an address – usually all fields are displayed and it is highly beneficial to fetch the entire information from disk in one go. That is where a row store is the best option. However, what about analytical use cases? Row store or column store? Suppose we want to sum up our entire sales for the past 20 years. Do we need the name of those products we sold? Do we need the quantity and the place where we sold it? Actually, we do not. All we need is 1 out of, say, 20 columns. If we want to add up 100 billion rows it makes sense to just read information worth 1 column as opposed to the entire data. Reading 20 columns worth of data to throw 19 columns away because they are not needed for our purpose leads to a lot of I/O and adds a ton of overhead (such as finding the column we need in the row and so on). Analytics and BI are the use cases where column stores shine. However, suppose we store every column of a table in a[...]

AI生成摘要 本文介绍了行存储和列存储的概念及其在不同场景下的应用。在OLTP应用中,行存储更适合,因为需要读取整行数据。而在分析和BI场景下,列存储更适合,因为只需要读取部分列数据。作者通过比较行存储和Citus列存储的性能,发现列存储在存储空间和查询性能方面都有优势。但是,某些操作只能在本地存储中实现,而在分布式环境中难以实现。因此,在设计决策时需要考虑到这一点。

相关推荐 去reddit讨论

Planet PostgreSQL -

Hans-Juergen Schoenig: “hired” vs. “fired” – fuzzy search in PostgreSQL

When dealing with data (and life in general) small things can have a major impact. The difference between “hired” and “fired” is just one simple character but in many cases it does have real world implications. The question naturally arising is: How can we use good old community Open Source PostgreSQL to do fuzzy search to get better results? This blog will give you some insights into how this can be done and which benefits the end user can enjoy when using those techniques. As fuzzy search is a huge topic, I want to limit the post to what is possible using the fuzzystrmatch extension and point to some further reading wherever it makes sense to do so. Creating sample data To approach fuzzy string search in PostgreSQL we first have to create some simple sample data. For the sake of simplicity, it does not need much to show how this works: postgres=# CREATE TABLE t_sample (x text); CREATE TABLE postgres=# INSERT INTO t_sample VALUES ('hired'), ('fired'), ('pump'), ('dump'), ('failure'); INSERT 0 5 postgres=# TABLE t_sample; x --------- hired fired pump dump failure (5 rows) All we need is 5 simple rows and we are able to demonstrate what true Open Source is capable of. The fuzzystrmatch extension The first thing people might want to use in PostgreSQL is the fuzzystrmatch extension. It contains basic functionality such as Soundex, Levenshtein and so on. Let us activate the extension which is shipped as part of the PostgreSQL contrib package: postgres=# CREATE EXTENSION fuzzystrmatch; CREATE EXTENSION Soundex is basically one of the more popular algorithms. While it is quite outdated and not state of the art anymore, it still does serve its purpose here and there. Let us see and run it: postgres=# SELECT x, soundex(x) FROM t_sample; x | soundex ---------+--------- hired | H630 fired | F630 pump | P510 dump | D510 failure | F460 (5 rows) What we see here is that the soundex function encodes the input string. What we can do now is encode the[...]

AI生成摘要 本文介绍如何使用 PostgreSQL 的模糊搜索功能,通过 fuzzystrmatch 扩展实现 Soundex、Levenshtein 和 metaphone 算法。可以将输入字符串编码后进行匹配,也可以使用 Levenshtein 算法查找拼写错误不超过一定次数的字符串。如果需要更高级的相似度搜索,可以参考其他文章。

相关推荐 去reddit讨论

解道jdon.com -

PostgreSQL 16 发布!

2023 年 9 月 14 日 - PostgreSQL 全球开发集团今天宣布发布 PostgreSQL 16,这是世界上最先进的开源数据库的最新版本。 PostgreSQL 16 提高了性能,在查询并行性、批量数据加载和逻辑复制方面有显着改进。此版本为开发人员和管理员提供了许多功能,包括更多 SQL/JSON 语法、针对工作负载的新监控统计数据,以及定义访问控制规则以管理大型队列策略的更大灵活性。 性能改进 PostgreSQL 16 通过新的查询规划器优化提高了现有 PostgreSQL 功能的性能。 在最新版本中: 查询规划器可以并行处理 FULL 和 RIGHT ..

AI生成摘要 PostgreSQL全球开发集团发布了最新版本的开源数据库PostgreSQL 16,提高了性能和逻辑复制功能。新版本提供了更多SQL/JSON语法、新的监控统计数据和更大的灵活性。查询规划器可以并行处理FULL和RIGHT连接,优化聚合函数查询计划,使用增量排序和优化窗口函数。逻辑复制可以从备用实例执行,提供了新的工作负载分配选择。订阅者可以使用并行工作者应用大型事务,使用B树索引查找行,使用二进制格式加快初始表同步。新版本还添加了对双向逻辑复制的支持。

相关推荐 去reddit讨论

Planet PostgreSQL -

Ernst-Georg Schmid: Static code analysis in PostgreSQL like ORACLE has.

Out of the box, PostgreSQL lacks static code analysis at compile time like ORACLE can do. plpgsql_check provides this capability (and a profiler), but needs to be called manually. So I tried to emulate ORACLE's behavior with an event trigger. The plpgsql_compile_check extension is experimental. You MUST preload plpgsql and plpgsql_check with shared_preload_libraries='plpgsql,plpgsql_check' in postgresql.conf, otherwise strange things happen!

AI生成摘要

相关推荐 去reddit讨论

Planet PostgreSQL -

Amit Kapila: Discussing PostgreSQL: What changes in version 16, how we got here, and what to expect in future releases

Earlier this year, I was in Canada for PGConf 2023, where I talked about the evolution of PostgreSQL from a Berkeley research project to its current status as the most advanced open-source database, and discussed the various changes introduced for PostgreSQL 16, particularly regarding logical replication.

AI生成摘要 PostgreSQL 16版新增多项功能,包括逻辑复制机制、文本排序规则、SQL/JSON标准构造器、并行哈希全连接等的改进。同时,支持Kerberos凭证委派和系统证书池的证书验证。新版还加入了CPU加速、连接负载均衡、LZ4和Zstandard压缩等功能。未来版本可能会加入逻辑复制的改进和表同步工作器的重用等功能。

相关推荐 去reddit讨论

Planet PostgreSQL -

Hans-Juergen Schoenig: Data locality: Scaling PostgreSQL with Citus intelligently

While sharding is often advertised as “THE solution to PostgreSQL scalability”, it is necessary to keep some technical aspects in consideration in terms of performance. The rule is: Sharding should not be used without a deeper awareness of what it is you are actually doing to the data. It’s important to keep in mind that sharding has to be applied in a clever and thoughtful manner. One of the most common mistakes is to ignore the concept of “data locality”. It’s important for many IT problems, but crucial in the context of database sharding. Citus is one of the most sophisticated sharding solutions in the PostgreSQL world. It can help you to achieve maximum scalability and allows for efficient analytics as well as OLTP. Citus is available on-premise or as part of the Microsoft Azure cloud. What is data locality? Let’s take a look together. Preparing data for sharding To demonstrate the concept, we first have to create two tables. For the sake of simplicity, we’ll use customers and sales: postgres=# CREATE TABLE t_customer ( id int, name text ); CREATE TABLE postgres=# CREATE TABLE t_sales ( id int, customer_id int, whatever text ); CREATE TABLE The data model is really straightforward. In this scenario, the typical way to analyse the data is to join the customer with the sales table. Why is this relevant? To understand it, first let’s distribute the table and add some data: postgres=# SELECT create_distributed_table('t_customer', 'id'); create_distributed_table -------------------------- (1 row) postgres=# SELECT create_distributed_table('t_sales', 'id'); create_distributed_table -------------------------- (1 row) Note that the data is sharded using the “id” which is not the join criteria. In the next step, we can load some data: postgres=# INSERT INTO t_customer SELECT *, 'dummy' FROM generate_series(1, 1000000); INSERT 0 1000000 postgres=# INSERT INTO t_sales SELECT id, random()*100000, 'dummy' FROM generate_series(1, 10[...]

AI生成摘要 本文介绍了在使用分片技术时需要考虑的技术细节,特别是数据本地性的概念。如果不考虑数据本地性,分片可能会导致性能下降。 Citus是PostgreSQL世界中最复杂的分片解决方案之一,可以帮助您实现最大的可扩展性,并允许高效的分析和OLTP。 Citus可在本地或作为Microsoft Azure云的一部分使用。

相关推荐 去reddit讨论

Planet PostgreSQL -

Hans-Juergen Schoenig: Monitoring PostgreSQL replication

PostgreSQL replication is not just a way to scale your database to run ever larger workloads: it’s also a way to make your database infrastructure redundant, more reliable and resilient. There is, however, a potential for replication lag, which needs to be monitored. How can you monitor replication lag in PostgreSQL? What is replication lag? And how can you monitor PostgreSQL replication in general?   Let’s dive in and find out. Checking replication lag while monitoring PostgreSQL replication Streaming replication For the sake of this example, I have set up a database server (PostgreSQL 16) and a single replica.     When monitoring replication delay and replication lag, look at the system view called pg_stat_replication. It contains all the information you’ll need to identify and diagnose replication problems. Here’s what the view looks like: postgres=# \d pg_stat_replication View "pg_catalog.pg_stat_replication" Column | Type | Collation | Nullable | Default ------------------+--------------------------+-----------+----------+--------- pid | integer | | | usesysid | oid | | | usename | name | | | application_name | text | | | client_addr | inet | | | client_hostname | text | | | client_port | integer | | | backend_start | timestamp with time zone | | | backend_xmin | xid | | | state | text | | | sent_lsn | pg_lsn | | | write_lsn | pg_lsn | | | flush_lsn | pg_lsn | | [...]

AI生成摘要 本文介绍了如何在PostgreSQL中监控复制延迟。通过查看系统视图pg_stat_replication,可以了解WAL发送方的信息,包括活动流和状态。*_lsn列包括sent_lsn、write_lsn、flush_lsn和replay_lsn,用于了解数据流的情况。此外,还介绍了pg_stat_wal_receiver视图和复制插槽的监控方法。最后,建议订阅新闻通讯或关注社交媒体以获取PostgreSQL的重要更新。

相关推荐 去reddit讨论

Planet PostgreSQL -

Pavel Luzanov: PostgreSQL 17: part 1 or CommitFest 2023-07

We continue to follow the news in the world of PostgreSQL. The PostgreSQL 16 Release Candidate 1 was rolled out on August 31. If all is well, PostgreSQL 16 will officially release on September 14. What has changed in the upcoming release after the April code freeze? What’s getting into PostgreSQL 17 after the first commitfest? Read our latest review to find out! PostgreSQL 16 For reference, here are our previous reviews of PostgreSQL 16 commitfests: 2022-07, 2022-09, 2022-11, 2023-01, 2023-03. Since April, there have been some notable changes. Let’s start with the losses. The following updates have not made it into the release: MAINTAIN — a new privilege for table maintenance (commit: 151c22de Setting parameter values at the database and user level (commit: b9a7a822) Some patches have been updated: ...

AI生成摘要 PostgreSQL 16 Release Candidate 1 was released on August 31 and is expected to officially release on September 14. Notable changes include a new command for role administration privileges and updates to debugging parameters and localization providers. PostgreSQL 17 has added two new columns to the pg_stat_progress_vacuum view, enabled incremental sorting for GiST and SP-GiST indexes, and allowed for exclusion constraints on partitioned tables. Other updates include the ability to use non-unique indexes for identifying modified rows and the ability for extension developers to define their own wait events.

相关推荐 去reddit讨论