create da..."/>
溫馨提示×

溫馨提示×

您好,登錄后才能下訂單哦!

密碼登錄×
登錄注冊×
其他方式登錄
點擊 登錄注冊 即表示同意《億速云用戶服務(wù)條款》

Hive-1.2.0學習筆記(二)Hive數(shù)據(jù)類型

發(fā)布時間:2020-07-12 16:09:36 來源:網(wǎng)絡(luò) 閱讀:2928 作者:luchunli1985 欄目:大數(shù)據(jù)

魯春利的工作筆記,誰說程序員不能有文藝范?



在hive中創(chuàng)建mywork數(shù)據(jù)庫,以后的測試在該數(shù)據(jù)庫中進行,避免每次都使用default數(shù)據(jù)庫。

hive> create database mywork;
OK
Time taken: 0.487 seconds
hive> show databases;
OK
default
mywork
Time taken: 0.614 seconds, Fetched: 2 row(s)
hive> 
hive> use mywork;
OK
Time taken: 0.064 seconds
hive> create table student(id int, name string);
OK
Time taken: 0.519 seconds
hive>

查看Hive在HDFS上的存儲

[hadoop@dnode1 ~]$ hdfs dfs -ls -R /user/hive
drwxrw-rw-   - hadoop hadoop          0 2015-12-08 21:37 /user/hive/warehouse
drwxrw-rw-   - hadoop hadoop          0 2015-12-08 21:36 /user/hive/warehouse/mywork.db
drwxrw-rw-   - hadoop hadoop          0 2015-12-08 21:36 /user/hive/warehouse/mywork.db/student
[hadoop@dnode1 ~]$


Hive支持的數(shù)據(jù)類型如下:

    原生類型:

TINYINT                1字節(jié)
SMALLINT               2字節(jié)
INT                    4字節(jié)
BIGINT                 8字節(jié)
BOOLEAN                true/false
FLOAT                  4字節(jié)
DOUBLE                 8字節(jié)
STRING                 字符串
BINARY (Hive 0.8.0以上才可用)
TIMESTAMP (Hive 0.8.0以上才可用)

    復合類型:

arrays: ARRAY<data_type>                有序字段,類型必須相同
maps: MAP<primitive_type, data_type>    無序的鍵/值對
structs: STRUCT<col_name : data_type [COMMENT col_comment], ...>    一組命名的字段
union: UNIONTYPE<data_type, data_type, ...>

    說明:ARRAY數(shù)據(jù)類型通過下標來獲取值,如arrays[0],MAP通過["指定域名稱"]訪問, STRUCT類型通過點方式訪問(如structs.col_name)。  


建表示例:

hive> create table employee (
    > eno int comment 'the no of employee',
    > ename string comment 'name of employee',
    > salary float comment 'salary of employee',
    > subordinates array<string> comment 'employees managed by current employee',
    > deductions map<string, float> comment 'deductions',
    > address struct<province : string, city : string, street : string, zip : int> comment 'address'
    > ) comment 'This is table of employee info';
OK
Time taken: 0.33 seconds
hive>

在Hive中各列之間,以及復合類型內(nèi)部使用了不同的分隔符來指定,每行數(shù)據(jù)對應(yīng)一條記錄。

Hive-1.2.0學習筆記(二)Hive數(shù)據(jù)類型


在${HIVE_HOME}/data目錄下創(chuàng)建文件data_default.txt文件,采用默認分隔符,內(nèi)容為:

Hive-1.2.0學習筆記(二)Hive數(shù)據(jù)類型

Hive默認的字段分隔符為ascii碼的控制符\001,建表的時候用fields terminated by '\001'。造數(shù)據(jù)在vi 打開文件里面,用ctrl+v然后再ctrl+a可以輸入這個控制符\001(即^A)。按順序,\002的輸入方式為ctrl+v,ctrl+b。以此類推。

說明:

1000                    員工編號
zhangsan                員工姓名
5000.0                  員工工資
lisi^Bwangwu            下屬員工
ptax^C200^Bpension^C200                工資扣除金額(如稅收等)
shandong^Bheze^Bdingtao^B274106        家庭住址(struct結(jié)構(gòu)只需指定值即可)


加載數(shù)據(jù)

hive> load data local inpath 'data/data_default.txt' into table employee;
Loading data to table mywork.employee
Table mywork.employee stats: [numFiles=1, numRows=0, totalSize=83, rawDataSize=0]
OK
Time taken: 0.426 seconds
hive> select * from employee;
OK
1000    zhangsan        5000.0  ["lisi","wangwu"]       {"ptax":200.0,"pension":200.0}  {"province":"shandong","city":"heze","street":"dingtao","zip":274106}
Time taken: 0.114 seconds, Fetched: 1 row(s)
hive>
# 對于復合類型數(shù)據(jù)查詢方式如下
hive> select eno, ename, salary, subordinates[0], deductions['ptax'], address.province from employee;
OK
1000    zhangsan        5000.0  lisi    200.0   shandong
Time taken: 0.129 seconds, Fetched: 1 row(s)
hive>


查看HDFS數(shù)據(jù)結(jié)構(gòu)

[hadoop@dnode1 ~]$ hdfs dfs -ls -R /user/hive/warehouse/
drwxrw-rw-   - hadoop hadoop          0 2015-12-09 00:00 /user/hive/warehouse/mywork.db
drwxrw-rw-   - hadoop hadoop          0 2015-12-09 00:00 /user/hive/warehouse/mywork.db/employee
-rwxrw-rw-   2 hadoop hadoop         83 2015-12-09 00:00 /user/hive/warehouse/mywork.db/employee/data_default.txt
drwxrw-rw-   - hadoop hadoop          0 2015-12-08 23:03 /user/hive/warehouse/mywork.db/student
[hadoop@dnode1 ~]$ hdfs dfs -text /user/hive/warehouse/mywork.db/employee/data_default.txt
1000zhangsan5000.0lisiwangwuptax200pension200shandonghezedingtao274106
[hadoop@dnode1 ~]$

Hive-1.2.0學習筆記(二)Hive數(shù)據(jù)類型


Hive-1.2.0學習筆記(二)Hive數(shù)據(jù)類型


自定義分隔符:

hive> create table employee_02 (
    > eno int comment 'the no of employee',
    > ename string comment 'name of employee',
    > salary float comment 'salary of employee',
    > subordinates array<string> comment 'employees managed by current employee',
    > deductions map<string, float> comment 'deductions',
    > address struct<province : string, city : string, street : string, zip : int> comment 'address'
    > ) comment 'This is table of employee info'
    > row format delimited fields terminated by '\t'
    > collection items terminated by ','
    > map keys terminated by ':'
    > lines terminated by '\n';
OK
Time taken: 0.228 seconds
hive> load data local inpath 'data/data_employee02.txt' into table employee_02;
Loading data to table mywork.employee_02
Table mywork.employee_02 stats: [numFiles=1, totalSize=99]
OK
Time taken: 0.371 seconds
hive> select * from employee_02;
OK
1000    'zhangsan'      5000.0  ["'lisi'","'wangwu'"]   {"'ptax'":200.0,"'pension'":200.0}      {"province":"'shandong'","city":"'heze'","street":"'dingtao'","zip":274106}
Time taken: 0.101 seconds, Fetched: 1 row(s)
hive>


data/employee02.txt文件內(nèi)容為

[hadoop@nnode data]$ pwd
/usr/local/hive1.2.0/data
[hadoop@nnode data]$ cat data_employee02.txt 
1000    'zhangsan'      5000.0  'lisi','wangwu' 'ptax':200,'pension':200        'shandong','heze','dingtao',274106
[hadoop@nnode data]$

說明:由于在文本文件中包含有單引號,在load到hive的表之后表示方式為屬性加雙引號,這里的單引號被認為了是屬性或值的一部分了,需要注意。


查看詳細表定義

# 建表時為默認設(shè)置
hive> describe formatted employee;
OK
# col_name              data_type               comment             
                 
eno                     int                     the no of employee  
ename                   string                  name of employee    
salary                  float                   salary of employee  
subordinates            array<string>           employees managed by current employee
deductions              map<string,float>       deductions          
address                 struct<province:string,city:string,street:string,zip:int>       address             
                 
# Detailed Table Information             
Database:               mywork                   
Owner:                  hadoop                   
CreateTime:             Tue Dec 08 23:10:07 CST 2015     
LastAccessTime:         UNKNOWN                  
Protect Mode:           None                     
Retention:              0                        
Location:               hdfs://cluster/user/hive/warehouse/mywork.db/employee    
Table Type:             MANAGED_TABLE            
Table Parameters:                
        COLUMN_STATS_ACCURATE   true                
        comment                 This is table of employee info
        numFiles                1                   
        numRows                 0                   
        rawDataSize             0                   
        totalSize               83                  
        transient_lastDdlTime   1449590423          
                 
# Storage Information            
SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat:            org.apache.hadoop.mapred.TextInputFormat
OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed:             No                       
Num Buckets:            -1                       
Bucket Columns:         []                       
Sort Columns:           []                       
Storage Desc Params:             
        serialization.format    1                   
Time taken: 0.098 seconds, Fetched: 37 row(s)

# 建表時自定義了分隔符
hive> describe formatted employee_02;
OK
# col_name              data_type               comment             
                 
eno                     int                     the no of employee  
ename                   string                  name of employee
salary                  float                   salary of employee  
subordinates            array<string>           employees managed by current employee
deductions              map<string,float>       deductions          
address                 struct<province:string,city:string,street:string,zip:int> address
                 
# Detailed Table Information             
Database:               mywork                   
Owner:                  hadoop                   
CreateTime:             Wed Dec 09 00:12:53 CST 2015     
LastAccessTime:         UNKNOWN                  
Protect Mode:           None                     
Retention:              0                        
Location:               hdfs://cluster/user/hive/warehouse/mywork.db/employee_02   
Table Type:             MANAGED_TABLE            
Table Parameters:                
        COLUMN_STATS_ACCURATE   true                
        comment                 This is table of employee info
        numFiles                1                   
        totalSize               99                  
        transient_lastDdlTime   1449591260          
                 
# Storage Information            
SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat:            org.apache.hadoop.mapred.TextInputFormat
OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed:             No                       
Num Buckets:            -1                       
Bucket Columns:         []                       
Sort Columns:           []                       
Storage Desc Params:             
        colelction.delim        ,                   
        field.delim             \t                  
        line.delim              \n                  
        mapkey.delim            :                   
        serialization.format    \t                  
Time taken: 0.116 seconds, Fetched: 39 row(s)
hive>


遺留問題:

hive> delete from student;
FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations.
hive>


注意事項:如果sql語句中含有tab格式的內(nèi)容,則會出現(xiàn)如下問題

Hive-1.2.0學習筆記(二)Hive數(shù)據(jù)類型


向AI問一下細節(jié)

免責聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點不代表本網(wǎng)站立場,如果涉及侵權(quán)請聯(lián)系站長郵箱:is@yisu.com進行舉報,并提供相關(guān)證據(jù),一經(jīng)查實,將立刻刪除涉嫌侵權(quán)內(nèi)容。

AI