您好,登錄后才能下訂單哦!
一、分區(qū)表概述
分區(qū)表也是內(nèi)部表,創(chuàng)建表時可以同時為表創(chuàng)建一個或多個分區(qū),這樣我們在加載數(shù)據(jù)時為其指定具體的分區(qū),查詢數(shù)據(jù)時可以指定具體的分區(qū)從而提高效率,分區(qū)可以理解為表的一個特殊的列。關(guān)鍵字是partitioned。
分區(qū)表實際上是將表文件分成多個有標(biāo)記的小文件方便查詢。
二、創(chuàng)建分區(qū)表
這里我們將oracle用戶scott下的emp表導(dǎo)出的emp.csv文件在Hive中創(chuàng)建分區(qū)表存放,按照部門編號進行分區(qū),emp表的字段信息值如下:
empno, ename, job, mgr, hiredate, salary, comm, deptno
7499, ALLEN, SALESMAN, 7698, 1981/2/20, 1600, 300, 30
hive> create table part_emp(
> empno int,
> ename string,
> job string,
> mgr int,
> hiredate string,
> salary float,
> comm float
> )
> partitioned by (deptno int)
> row format delimited fields terminated by ',';
OK
Time taken: 0.061 seconds
查看分區(qū)表,其中# Partition Information為分區(qū)信息,有兩個分區(qū)year和city
hive> desc extended part_emp;
OK
empno int None
ename string None
job string None
mgr int None
hiredate string None
salary float None
comm float None
deptno int None
# Partition Information
# col_name data_type comment
deptno int None
三、分區(qū)表插入數(shù)據(jù)
1、通過load命令加載數(shù)據(jù)
第一次分區(qū)信息為deptno=10
hive> load data local inpath '/root/emp.csv_10' into table part_emp partition(deptno=10);
Copying data from file:/root/emp.csv_10
Copying file: file:/root/emp.csv_10
Loading data to table default.part_emp partition (deptno=10)
[Warning] could not update stats.
OK
Time taken: 2.267 seconds
第二次分區(qū)信息為deptno=20
hive> load data local inpath '/root/emp.csv_20' into table part_emp partition(deptno=20);
Copying data from file:/root/emp.csv_20
Copying file: file:/root/emp.csv_20
Loading data to table default.part_emp partition (deptno=20)
[Warning] could not update stats.
OK
Time taken: 8.151 seconds
第三次分區(qū)信息為deptno=30,第三次通過insert的方式加載分區(qū)信息
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
hive> load data local inpath '/root/emp.csv_30' into table part_emp partition(deptno=30);
Copying data from file:/root/emp.csv_30
Copying file: file:/root/emp.csv_30
Loading data to table default.part_emp partition (deptno=30)
[Warning] could not update stats.
OK
Time taken: 7.344 seconds
四、根據(jù)分區(qū)查詢,分區(qū)很像是一個特殊的列
hive> select * from part_emp where deptno=10;
7782 CLARK MANAGER 7839 1981/6/9 2450.0 100.0 10
7839 KING PRESIDENT NULL 1981/11/17 5000.0 120.0 10
7934 MILLER CLERK 7782 1982/1/23 1300.0 133.0 10
8129 Abama MANAGER 7839 1981/6/9 2450.0 122.0 10
8131 Jimy PRESIDENT NULL 1981/11/17 5000.0 333.0 10
8136 Goodle CLERK 7782 1982/1/23 1300.0 421.0 10
查看分區(qū)表的分區(qū)信息
hive> show partitions part_emp;
deptno=10
deptno=20
deptno=30
五、分區(qū)表在HDFS上的存儲形式
一個分區(qū)對應(yīng)一個目錄
六、觀察分區(qū)表查詢和普通表查詢的執(zhí)行計劃
普通表
hive> explain select * from emp where deptno=10;
ABSTRACT SYNTAX TREE:
(TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME emp))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR TOK_ALLCOLREF)) (TOK_WHERE (= (TOK_TABLE_OR_COL deptno) 10))))
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 is a root stage
STAGE PLANS:
Stage: Stage-1
Map Reduce
Alias -> Map Operator Tree:
emp
TableScan
alias: emp
Filter Operator
predicate:
expr: (deptno = 10)
type: boolean
Select Operator
expressions:
expr: empno
type: int
expr: ename
type: string
expr: job
type: string
expr: mgr
type: int
expr: hiredate
type: string
expr: salary
type: float
expr: comm
type: float
expr: deptno
type: int
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7
File Output Operator
compressed: false
GlobalTableId: 0
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Stage: Stage-0
Fetch Operator
limit: -1
分區(qū)表:
hive> explain select * from part_emp where deptno=10;
ABSTRACT SYNTAX TREE:
(TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME part_emp))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR TOK_ALLCOLREF)) (TOK_WHERE (= (TOK_TABLE_OR_COL deptno) 10))))
STAGE DEPENDENCIES:
Stage-0 is a root stage
STAGE PLANS:
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
TableScan
alias: part_emp
Select Operator
expressions:
expr: empno
type: int
expr: ename
type: string
expr: job
type: string
expr: mgr
type: int
expr: hiredate
type: string
expr: salary
type: float
expr: comm
type: float
expr: deptno
type: string
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7
ListSink
免責(zé)聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點不代表本網(wǎng)站立場,如果涉及侵權(quán)請聯(lián)系站長郵箱:is@yisu.com進行舉報,并提供相關(guān)證據(jù),一經(jīng)查實,將立刻刪除涉嫌侵權(quán)內(nèi)容。