溫馨提示×

溫馨提示×

您好,登錄后才能下訂單哦!

密碼登錄×
登錄注冊×
其他方式登錄
點(diǎn)擊 登錄注冊 即表示同意《億速云用戶服務(wù)條款》

PostgreSQL DBA(113) - pgAdmin(Don't do this:Don't use char(n))

發(fā)布時(shí)間:2020-08-15 21:07:43 來源:ITPUB博客 閱讀:203 作者:husthxd 欄目:關(guān)系型數(shù)據(jù)庫

no zuo no die系列,來自于pg的wiki。
這一節(jié)的內(nèi)容是:不要使用char(n) 。
理由是:

Any string you insert into a char(n) field will be padded with spaces to the declared width. That’s probably not what you actually want.
The manual says:
Values of type character are physically padded with spaces to the specified width n, and are stored and displayed that way. However, trailing spaces are treated as semantically insignificant and disregarded when comparing two values of type character. In collations where whitespace is significant, this behavior can produce unexpected results; for example SELECT ‘a(chǎn) ‘::CHAR(2) collate “C” < E’a\n’::CHAR(2) returns true, even though C locale would consider a space to be greater than a newline. Trailing spaces are removed when converting a character value to one of the other string types. Note that trailing spaces are semantically significant in character varying and text values, and when using pattern matching, that is LIKE and regular expressions.
That should scare you off it.
The space-padding does waste space, but doesn’t make operations on it any faster; in fact the reverse, thanks to the need to strip spaces in many contexts.
It’s important to note that from a storage point of view char(n) is not a fixed-width type. The actual number of bytes varies since characters may take more than one byte, and the stored values are therefore treated as variable-length anyway (even though the space padding is included in the storage).

原因是期望指定n長,但由于字符編碼(如中文字符,GB2312是2個(gè)字節(jié),而UTF8是3個(gè)字節(jié))的原因,實(shí)際跟預(yù)想的不符,而且會(huì)出現(xiàn)影響排序等其他副作用。

testdb=# drop table if exists t_char;
DROP TABLE
testdb=# create table t_char(id int,c1 char(10),c2 varchar(10));
CREATE TABLE
testdb=# 
testdb=# insert into t_char values(1,'測試123','123123');
INSERT 0 1
testdb=# insert into t_char values(2,'abc123','123123');
INSERT 0 1
testdb=# 
testdb=# insert into t_char values(3,'a','a ');
INSERT 0 1
testdb=# insert into t_char values(4,E'a\n',E'a\n');
INSERT 0 1
testdb=#

使用length函數(shù)獲取長度

testdb=# select id,length(c1),length(c2) from t_char order by id;
 id | length | length 
----+--------+--------
  1 |      5 |      6
  2 |      6 |      6
  3 |      1 |      2
  4 |      2 |      2
(4 rows)

如上所述,使用length函數(shù)獲取的實(shí)際是字符個(gè)數(shù)而不是實(shí)際的字節(jié)數(shù),如“測試123”實(shí)際的字節(jié)數(shù)是9+5=14字節(jié)。

testdb=# select id,length(c1),octet_length(c1),length(c2),octet_length(c2) from t_char order by id;
 id | length | octet_length | length | octet_length 
----+--------+--------------+--------+--------------
  1 |      5 |           14 |      6 |            6
  2 |      6 |           10 |      6 |            6
  3 |      1 |           10 |      2 |            2
  4 |      2 |           10 |      2 |            2
(4 rows)

在字符串比較上面,雖然空格的ascii碼值(0x20)比’\n’(0x0a)要大,但查詢的實(shí)際效果看起來卻是char(10)定義的’a’比’a\n’要小:

testdb=# select E'a\n'::bytea;
 bytea  
--------
 \x610a
(1 row)
testdb=# select E'a '::bytea;
 bytea  
--------
 \x6120
(1 row)
testdb=# select * from t_char where c1 < E'a\n';
 id |     c1     | c2 
----+------------+----
  3 | a          | a 
(1 row)

參考資料
Don’t Do This

向AI問一下細(xì)節(jié)

免責(zé)聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點(diǎn)不代表本網(wǎng)站立場,如果涉及侵權(quán)請聯(lián)系站長郵箱:is@yisu.com進(jìn)行舉報(bào),并提供相關(guān)證據(jù),一經(jīng)查實(shí),將立刻刪除涉嫌侵權(quán)內(nèi)容。

AI