Paul Ramsey: PostGIS Clustering with K-Means

原文英文,约700词,阅读约需3分钟。发表于:

Clustering points is a common task for geospatial data analysis, and PostGIS provides several functions for clustering. ST_ClusterDBSCAN ST_ClusterKMeans ST_ClusterIntersectingWin ST_ClusterWithinWin We previously looked at the popular DBSCAN spatial clustering algorithm, that builds clusters off of spatial density. This post explores the features of the PostGIS ST_ClusterKMeans function. K-means clustering is having a moment, as a popular way of grouping very high-dimensional LLM embeddings, but it is also useful in lower dimensions for spatial clustering. ST_ClusterKMeans will cluster 2-dimensional and 3-dimensional data, and will also perform weighted clustering on points when weights are provided in the "measure" dimension of the points. Some Points to Cluster To try out K-Means clustering we need some points to cluster, in this case the 1:10M populated places from Natural Earth. Download the GIS files and load up to your database, in this example using ogr2ogr. ogr2ogr \ -f PostgreSQL \ -nln popplaces \ -lco GEOMETRY_NAME=geom \ PG:'dbname=postgres' \ ne_10m_populated_places_simple.shp Planar Cluster A simple clustering in 2D space looks like this, using 10 as the number of clusters: CREATE TABLE popplaces_geographic AS SELECT geom, pop_max, name, ST_ClusterKMeans(geom, 10) OVER () AS cluster FROM popplaces; Note that pieces of Russia are clustered with Alaska, and Oceania is split up. This is because we are treating the longitude/latitude coordinates of the points as if they were on a plane, so Alaska is very far away from Siberia. For data confined to a small area, effects like the split at the dateline do not matter, but for our global example, it does. Fortunately there is a way to work around it. Geocentric Cluster We can convert the longitude/latitude coordinates of the original data to a geocentric coordinate system using ST_Transform. A "geocentric" system is one in which the origin is[...]

PostGIS提供了DBSCAN和ST_ClusterKMeans函数用于聚类。ST_ClusterKMeans可对2D和3D数据进行加权聚类,使用ST_Transform解决日期线问题。聚类结果展示了全球数据的效果。

Paul Ramsey: PostGIS Clustering with K-Means
相关推荐 去reddit讨论