Planet PostgreSQL

Planet PostgreSQL -

Jonathan Katz: Vectors are the new JSON in PostgreSQL

Vectors are the new JSON. That in itself is an interesting statement, given vectors are a well-studied mathematical structure, and JSON is a data interchange format. And yet in the world of data storage and retrieval, both of these data representations have become the lingua franca of their domains and are either essential, or soon-to-be-essential, ingredients in modern application development. And if current trends continue (I think they will), vectors will be as crucial as JSON is for building applications. Generative AI and all the buzz around it has caused developers to look for convenient ways to store and run queries against the outputs of these systems, with PostgreSQL being a natural choice for a lot of reasons. But even with the hype around generative AI, this is not a new data pattern. Vectors, as a mathematical concept, have been around for hundreds of years. Machine learning has over a half-century worth of research. The array – the fundamental data structure for a vector – is taught in most introductory computer science classes. Even PostgreSQL has had support for vector operations for over 20 years (more on that later)! So, what is new? It’s the accessibility of these AI/ML algorithms and how easy it is to represent some “real world” structure (text, images, video) as a vector and store it for some future use by an application. And again, while folks may point to the fact it’s not new to store the output of these systems (“embeddings”) in data storage systems, the emergent pattern is the accessibility of being able to query and return this data in near real-time in almost any application. What does this have to do with PostgreSQL? Everything! Efficient storage and retrieval of a data type used in a common pattern greatly simplifies app development, lets people to keep their related ata in the same place, and can work with existing tooling. We saw this with JSON over 10 years ago, and now we’re seeing this with vector data. To understand why vectors are the new JSON, let’s rewi[...]

本文讨论了在PostgreSQL中存储和查询向量数据的重要性,介绍了PostgreSQL对向量操作的支持和向量作为一种新的数据模式的可访问性。同时回顾了JSON在PostgreSQL中的发展历程,并指出PostgreSQL 9.4版本的发布使其成为了一种竞争力强的JSON存储系统。作者还介绍了向量的崛起和pgvector扩展的使用,以及对向量在PostgreSQL中的更好支持的未来展望。最后,鼓励读者提供反馈,以帮助PostgreSQL社区提供对向量查询的最佳支持。

JSON PostgreSQL pgvector 向量数据 查询支持

相关推荐 去reddit讨论

热榜 Top10

Dify.AI
Dify.AI
eolink
eolink
观测云
观测云
LigaAI
LigaAI

推荐或自荐