PostgreSQL DBA(113) - pgAdmin(Don't do this：Don't use char(n))

發(fā)布時(shí)間：2020-08-15 21:07:43 來源：ITPUB博客閱讀：203 作者：husthxd 欄目：關(guān)系型數(shù)據(jù)庫

no zuo no die系列，來自于pg的wiki。
這一節(jié)的內(nèi)容是：不要使用char(n) 。
理由是:

Any string you insert into a char(n) field will be padded with spaces to the declared width. That’s probably not what you actually want.
The manual says:
Values of type character are physically padded with spaces to the specified width n, and are stored and displayed that way. However, trailing spaces are treated as semantically insignificant and disregarded when comparing two values of type character. In collations where whitespace is significant, this behavior can produce unexpected results; for example SELECT ‘a(chǎn) ‘::CHAR(2) collate “C” < E’a\n’::CHAR(2) returns true, even though C locale would consider a space to be greater than a newline. Trailing spaces are removed when converting a character value to one of the other string types. Note that trailing spaces are semantically significant in character varying and text values, and when using pattern matching, that is LIKE and regular expressions.
That should scare you off it.
The space-padding does waste space, but doesn’t make operations on it any faster; in fact the reverse, thanks to the need to strip spaces in many contexts.
It’s important to note that from a storage point of view char(n) is not a fixed-width type. The actual number of bytes varies since characters may take more than one byte, and the stored values are therefore treated as variable-length anyway (even though the space padding is included in the storage).

原因是期望指定n長，但由于字符編碼（如中文字符，GB2312是2個(gè)字節(jié)，而UTF8是3個(gè)字節(jié)）的原因，實(shí)際跟預(yù)想的不符，而且會(huì)出現(xiàn)影響排序等其他副作用。

testdb=# drop table if exists t_char;
DROP TABLE
testdb=# create table t_char(id int,c1 char(10),c2 varchar(10));
CREATE TABLE
testdb=# 
testdb=# insert into t_char values(1,'測試123','123123');
INSERT 0 1
testdb=# insert into t_char values(2,'abc123','123123');
INSERT 0 1
testdb=# 
testdb=# insert into t_char values(3,'a','a ');
INSERT 0 1
testdb=# insert into t_char values(4,E'a\n',E'a\n');
INSERT 0 1
testdb=#

使用length函數(shù)獲取長度

testdb=# select id,length(c1),length(c2) from t_char order by id;
 id | length | length 
----+--------+--------
  1 |      5 |      6
  2 |      6 |      6
  3 |      1 |      2
  4 |      2 |      2
(4 rows)

如上所述，使用length函數(shù)獲取的實(shí)際是字符個(gè)數(shù)而不是實(shí)際的字節(jié)數(shù)，如“測試123”實(shí)際的字節(jié)數(shù)是9+5=14字節(jié)。

testdb=# select id,length(c1),octet_length(c1),length(c2),octet_length(c2) from t_char order by id;
 id | length | octet_length | length | octet_length 
----+--------+--------------+--------+--------------
  1 |      5 |           14 |      6 |            6
  2 |      6 |           10 |      6 |            6
  3 |      1 |           10 |      2 |            2
  4 |      2 |           10 |      2 |            2
(4 rows)

在字符串比較上面，雖然空格的ascii碼值(0x20)比’\n’(0x0a)要大，但查詢的實(shí)際效果看起來卻是char(10)定義的’a’比’a\n’要小:

testdb=# select E'a\n'::bytea;
 bytea  
--------
 \x610a
(1 row)
testdb=# select E'a '::bytea;
 bytea  
--------
 \x6120
(1 row)
testdb=# select * from t_char where c1 < E'a\n';
 id |     c1     | c2 
----+------------+----
  3 | a          | a 
(1 row)

參考資料
Don’t Do This

向AI問一下細(xì)節(jié)

PostgreSQL DBA(113) - pgAdmin(Don't do this：Don't use char(n))

猜你喜歡

最新資訊

相關(guān)推薦

相關(guān)標(biāo)簽