Planet PostgreSQL -

Laurenz Albe: ERROR: invalid byte sequence – Fix bad encoding in PostgreSQL

It’s annoying to get error messages caused by encoding problems. But it is more annoying not to get error messages and end up with bad data in the database. I’ll show you how to fix bad encoding. Some basic facts about encoding in PostgreSQL Each PostgreSQL database has a server encoding. You can inspect (but not change) the PostgreSQL parameter server_encoding to see the encoding of the database you are connected to. You can choose the database encoding when you create a database, but you cannot change the server encoding after creating the database. This restriction is necessary because changing the server encoding will render string data corrupt. Each database session has a client encoding, which you can view and change using the PostgreSQL parameter client_encoding. If your client uses the C client library libpq, you can determine the client encoding with the environment variable PGCLIENTENCODING. In addition, psql has the command \encoding to view and change the client encoding. psql tries to guess a good client encoding from the current locale. If the client encoding isn’t specified anywhere, PostgreSQL will set it to the server encoding. PostgreSQL expects that character data sent by a client are encoded in the client encoding and converts them to the server encoding. Conversely, PostgreSQL converts query results to the client encoding. PostgreSQL supports many encodings. All supported server encodings must be a superset of ASCII. Preparing test data The server encoding of my database is UTF8. We’ll use this simple table: CREATE TABLE texts ( id bigint PRIMARY KEY, t text ); Using Linux’ echo command, I create an SQL script in UTF-8 encoding: echo -e "INSERT INTO texts VALUES (1, 'sch\xc3\xb6n');" \ > insert_utf8.sql Then I create a second script, with the same string in LATIN-1 (= ISO 8859-1) encoding: echo -e "INSERT INTO texts VALUES (2, 'sch\xf6n');" \ > insert_latin1.sql Both strings contain the German letter “ö”. LATIN-1 encodes it as a [...]

AI生成摘要 本文介绍了PostgreSQL中编码问题的解决方法。PostgreSQL支持多种编码方式,每个数据库会有一个服务器编码和客户端编码。服务器编码无法更改,而客户端编码可以通过设置环境变量或使用psql命令进行更改。PostgreSQL非常严格,不允许存储未正确编码的字符串,但是在客户端编码设置错误的情况下,数据可能会出现损坏而不被检测到。本文提供了修复一致和不一致编码数据的方法,包括导出和导入数据库。

相关推荐 去reddit讨论