13、Hive的DDL、DML語法案例實戰(zhàn)

發(fā)布時間：2020-02-14 11:09:52 來源：網絡閱讀：524 作者：victor19901114 欄目：大數(shù)據(jù)

1、Hive的DDL語法操作

1.1、Hive數(shù)據(jù)庫DDL操作

（1）創(chuàng)建數(shù)據(jù)庫
create database db_hive2;`<br/>`或者`<br/>`create database if not exists db_hive;
數(shù)據(jù)庫在HDFS上的默認存儲路徑/user/hive/warehouse/*.db
（2）顯示所有數(shù)據(jù)庫
show databases;
（3）查詢數(shù)據(jù)庫
show database like ‘db_hive’;
（4）查詢數(shù)據(jù)庫詳情
desc database db_hive;
（5）顯示數(shù)據(jù)庫
desc database extended db_hive;
（6）切換當前數(shù)據(jù)庫
use db_hive;
（7）刪除數(shù)據(jù)庫
#刪除為空的數(shù)據(jù)控
drop database db_hive;
#如果刪除的數(shù)據(jù)庫不存在，最好采用if exists判斷數(shù)據(jù)庫是否存在
drop database if exists db_hive;
#如果數(shù)據(jù)庫中有表存在，需要使用cascade強制刪除數(shù)據(jù)庫
drop database if exists db_hive cascade;

1.2、Hive表的DDL操作

1.2.1、建表語法介紹

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name
[(col_name data_type [COMMENT col_comment], ...)]
[COMMENT table_comment] 表的描述可加可不加
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)] 分區(qū)
[CLUSTERED BY (col_name, col_name, ...) 分桶
[SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]

重點：讀取文本是讀一行數(shù)據(jù)，需要用分隔符分割，用來匹配表的列
[ROW FORMAT row_format] row format delimited fields terminated by “分隔符”
[STORED AS file_format] 存儲對應的文件格式
[LOCATION hdfs_path]存儲在hdfs的哪個目錄

字段解釋說明：

CREATE TABLE ：創(chuàng)建指定名稱的表，如果存在報異常，可以使用 IF NOT EXISTS ：來避免這個異常。
EXTERNAL：創(chuàng)建外部表，在建表的同時可以指定源數(shù)據(jù)的路徑LOCATION：創(chuàng)建內部表時，會將數(shù)據(jù)移動到數(shù)據(jù)倉庫指向的路徑，若創(chuàng)建外部表不會有任何改變。在刪除表時，內部表的元數(shù)據(jù)和源數(shù)據(jù)都會被刪除，外部表不會刪除源數(shù)據(jù)。
COMMENT：為表和列增加注釋
PARTITIONED BY：創(chuàng)建分區(qū)表
CLUSTERED BY：創(chuàng)建分桶表
SORTED BY：創(chuàng)建排序后分桶表（不常用）
STORED AS ：指定存儲文件類型sequencefile（二進制序列文件）、textfile（文本）、rcfile（列式存儲格式文件），如果文件數(shù)據(jù)是純文本，可以使用STORED AS TEXTFILE。如果需要使用壓縮，使用STORED AS SEQUENCEFILE
LOCATION 指定表在 hdfs 上的存儲位置

1.2.2、創(chuàng)建內部表

1、直接使用標準的建表語句：

create table if not exists student11(
id int,
name string
)
row format delimited fields terminated by '\t'
stored as textfile;

使用文本data.txt

1 zhang

2 lisi

2、查詢建表法:

通過AS查詢語句完成建表：將子查詢的結果存放在新表里，有數(shù)據(jù)

create table if not exists student1 as select id,name from student;

3、like建表法：

根據(jù)已存在的表結構創(chuàng)建表

create table if not exists student2 like student;

4、查詢表的類型：

desc formatted student;

5、內部表的默認位置：

（根據(jù)自己情況來定）
13、Hive的DDL、DML語法案例實戰(zhàn)
/user/hive_remote/warehouse/db_hive.db

6、將數(shù)據(jù)導入到Hive表中：

舉列子：student11s是Hive表

load data local inpath '/opt/bigdata2.7/hivedata/student.txt' into table student11;

1.2.3、創(chuàng)建外部表

注意：default是數(shù)據(jù)庫的名

create external table if not exists default.emp(
id int,
name string
)
row format delimited fields terminated by '\t'
location '/ opt/bigdata2.7/hivedata'

創(chuàng)建外部表的時候需要加上external關鍵字，location字段可以指定，也可以不指定，不指定的話就是使用默認目錄/user/hive/warehouse

1.2.4、內部表與外部表相互轉換

? 1、內部表轉換為外部表

#把student 內部表改為外部表

alter table student set tblproperties('EXTERNAL'='TRUE');

? 2、外部表轉換成內部表

alter table student set tblproperties('EXTERNAL'='FALSE');

1.2.5、內部表與外部表區(qū)別

1、建表語法不同：

外部表建表的時候需要加上external關鍵字

2、數(shù)據(jù)存儲位置不同：

創(chuàng)建內部表的時候，會將數(shù)據(jù)移動到數(shù)據(jù)倉庫指向的路徑；若創(chuàng)建外部表，僅僅記錄數(shù)據(jù)所在的路徑，不對數(shù)據(jù)的位置進行任何改變。

2、刪除表之后：

內部表會刪除元數(shù)據(jù)，刪除表的數(shù)據(jù)。

外部表刪除之后，僅僅是把表的元數(shù)據(jù)刪除了，真實的數(shù)據(jù)還在，后期還可以恢復出來。

1.3、Hive表DDL語法經典案列

1.3.1、電影案列分析

1、數(shù)據(jù)格式：

戰(zhàn)狼1,吳京1:吳剛1:小明1,2017-08-01

戰(zhàn)狼2,吳京2:吳剛2:小明2,2017-08-02

戰(zhàn)狼3,吳京4:吳剛4:小明4,2017-08-03

戰(zhàn)狼4,吳京3:吳剛3:小明3,2017-08-04

戰(zhàn)狼5,吳京5:吳剛5:小明5,2017-08-05

2、建表語句：

create table t_movie(movie_name string,actors array<string>,first_date string)
row format delimited fields terminated by ','
collection items terminated by ':';

3、導入數(shù)據(jù)：

確保hadoop用戶對該文件夾有讀寫權限。
load data local inpath '/opt/bigdata2.7/hive/movie';
13、Hive的DDL、DML語法案例實戰(zhàn)
4、查詢每個電影的第二個主演：

select movie_name,actors[1] from t_movie;

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-hgfz0RZZ-1579482997640)(2%E3%80%81Hive%E7%9A%84DDL%E8%AF%AD%E6%B3%95%E6%93%8D%E4%BD%9C.assets/image-20200109093038358.png)]

5、查詢每部電影有幾名主演:

select movie_name,size(actors) as num from t_movie;
13、Hive的DDL、DML語法案例實戰(zhàn)
6、主演里包含吳剛5的電影

select movie_name,actors from t_movie where array_contains(actors,'吳剛5');
13、Hive的DDL、DML語法案例實戰(zhàn)
解析：

這里我們首先看到比較特殊的是主演的名字，而名字有都是string類型的，所以考慮到使用array類型，以為array存儲的都是想同類型的元素。這里我們要使用collection items terminated by ':',來設置指定復雜元素數(shù)據(jù)類型中元素的分隔符。

需要注意的是：collection items terminated by不僅是用來分隔array的，它的作用是分隔復雜數(shù)據(jù)類型里面的元素的。size內置函數(shù)是用來判斷array元素的個數(shù)，array_contains()是判斷array是否有這個元素。

1.3.2、個人檔案型數(shù)據(jù)建表案例：

1、數(shù)據(jù)格式：

1，張三，18:male:北京

2，李四，19:male:南京

3，王五，20:male:上海

4，哈哈，18:male:北京

5，嘿嘿，12:male:成都

6，嘻嘻，14:male:濟南

7，張麗，17:male:深圳

8，李物，19:male:重慶

2、建表語句：

create table t_user(id int,name string,info struct<age:string,sex:string,addr:string>)
row format delimited fields terminated by ','
collection items terminated by ':';

3、導入數(shù)據(jù)：

load data local inpath '/opt/bigdata2.7/hive/user' into table t_user;
13、Hive的DDL、DML語法案例實戰(zhàn)
4、查詢每一個人的id,名字,居住地址：

select id,name,info.addr from t_user;
13、Hive的DDL、DML語法案例實戰(zhàn)
解析：

這里比較特殊的字段是18:male:北京，對應的是年齡：性別：地址，每一個都有特殊的含義，我們考慮到無法構成一個鍵值對，所以map不合適，array只能包含相同的元素，而年齡是int類型，地址是strin類型，所以array不合適，所以考慮struct。

1.3.3、家庭檔案數(shù)據(jù)建表案列

1、數(shù)據(jù)描述：

1,小明,father:張三#mother:李麗#brother:小剛,28

2,小鴻,father:李四#mother:王麗#brother:小志,28

3,小鵬,father:張物#mother:李美#brother:小英,28

4,張飛,father:張五#mother:李影#brother:小全,28

2、建表語句：

create table t_family(id int,name string,family_mem map<string,string>,age int)
row format delimited fields terminated by ','
collection items terminated by '#'
map keys terminated by ':';

3、導入數(shù)據(jù)：

load data local inpath '/opt/bigdata2.7/hive/family' into table t_family;
13、Hive的DDL、DML語法案例實戰(zhàn)
4、查看每個人的父親：

select name,family_mem["father"] from t_family;
13、Hive的DDL、DML語法案例實戰(zhàn)

5、查看有哪些親屬關系：
select name,map_keys(family_mem),age from t_family;
13、Hive的DDL、DML語法案例實戰(zhàn)
6、查出每個人的親人名字：

select name,map_values(family_mem) as relations,age from t_family;
13、Hive的DDL、DML語法案例實戰(zhàn)
7、查出每個人親人的數(shù)量：

select id,name,size(family_mem) as relation_num,age from t_family;

2、Hive的DML語法操作

2.1、修改表的結構

2.1.1、修改表的名稱

alter table student_partition1 rename to student_partition2

2.1.2、表的結構信息

desc student_partition3;

desc formated student_partition3;

2.1.3、增加/修改/替換列

增加列：

alter table student_partition3 add columns(address string);

修改列：

alter table student_partition3 change column address address_id int;

替換列：

alter table student_partition3 replace columns(deptno string,dname string,loc string);

2.1.4、增加/刪除/查看表的分區(qū)

1、添加分區(qū):

（1）添加單個分區(qū)：

alter table student_partition1 add partition(dt='20170601');

（2）添加多個分區(qū)：

alter table student_partition1 add partition(dt='20170602') partition(dt='20170603');

2、刪除分區(qū):

alter table student_partition1 drop partition (dt='20170601');

alter table student_partition1 drop partition (dt='20170601') partition (dt='20170602');

3、查看分區(qū):

show partitions student_partition1;

2.2、Hive表的數(shù)據(jù)導入

2.2.1向表中加載數(shù)據(jù)

load data [local] impath 'datapath' overwrite | into table student [partition (partcol1=val1,...)];

load data: 表示加載數(shù)據(jù)

local:表示從本地加載數(shù)據(jù)到hive表中；否則從HDFS加載到hive表中

inpath: 表示加載數(shù)據(jù)的路徑

overwite：表示覆蓋表中已有數(shù)據(jù)，否則表示追加

into table:表示加載到哪張表

普通表舉例：

load data local inpath '/opt/bigdata2.7/hive/person.txt' into table person;

分區(qū)表舉例：

load data local inpath '/opt/bigdata2.7/hive/person.txt' into table person partition (dt="20190202");

2.2.2通過查詢語句向表中插入數(shù)據(jù)

從指定的表中查詢數(shù)據(jù)結果然后插入到目標表中

insert into/overwrite table tablename select **** from tablename;

insert into table student_partion1 partition(dt="2019-07-08") select * from tablename;

2.2.3、查詢語句中創(chuàng)建并加載數(shù)據(jù)（as select）

create table if not exists tablename as select id,name from tablename;

2.2.4、創(chuàng)建表時通過location指定加載路徑

創(chuàng)建表，并指定在hdfs上的位置

create table if not exists student1(
id int,
name string)
row format delimited fields terminated by '\t'
location '/usr/hive_remote/warehouse/student1';

create table if not exists person(
id int,
name string,
age int,
sex string
)
row format delimited fields terminated by ',';

上傳數(shù)據(jù)文件到hdfs對應的目錄中

在Linux中運行，注意不是hive端口

hdfs dfs -put /opt/bigdata2.7/hive/student1.txt /usr/hive_remote/warehouse/student1

2.2.5、Import數(shù)據(jù)到指定Hive表中

注意：先用export導出之后，再將數(shù)據(jù)導入

create table student2 like student1;

export table student1 to '/export/student1';

import table student2 from 'export/student1'

2.3、Hive表的導出

2.3.1、insert導出

1、將查詢數(shù)據(jù)的結果導出到本地

insert overwrite local directory '/opt/bigdata/export/student' select * from student;

2、將查詢結構格式化的導出到本地

insert overwrite local directory '/opt/bigdata/export/student'
row format delimited fields teminated by ','
select * from student;

3、將查詢結果導出到HDFS（沒有l(wèi)ocal）

insert overwrite directory '/user/export/student'
row format delimited fields terminated by ','
select * from student;

2.3.2、Hadoop命令導出到本地

hdfs dfs -get /user/hive_remote/warehouse/student/student.txt /opt/bigdata2.7/data

2.3.3、Hive Shell命令導出

hive -e 'select * from default.student' > /opt/bigdata/data/student1.txt

2.3.4、export導出到HDFS

export table default.student to '/user/hive/warehouse/export/student1';

向AI問一下細節(jié)

13、Hive的DDL、DML語法案例實戰(zhàn)

1、Hive的DDL語法操作

1.1、Hive數(shù)據(jù)庫DDL操作

1.2、Hive表的DDL操作

1.2.1、建表語法介紹

1.2.2、創(chuàng)建內部表

1.2.3、創(chuàng)建外部表

1.2.4、內部表與外部表相互轉換

1.2.5、內部表與外部表區(qū)別

1.3、Hive表DDL語法經典案列

1.3.1、電影案列分析

1.3.2、個人檔案型數(shù)據(jù)建表案例：

1.3.3、家庭檔案數(shù)據(jù)建表案列

2、Hive的DML語法操作

2.1、修改表的結構

2.1.1、修改表的名稱

2.1.2、表的結構信息

2.1.3、增加/修改/替換列

2.1.4、增加/刪除/查看表的分區(qū)

2.2、Hive表的數(shù)據(jù)導入

2.2.1向表中加載數(shù)據(jù)

2.2.2通過查詢語句向表中插入數(shù)據(jù)

2.2.3、查詢語句中創(chuàng)建并加載數(shù)據(jù)（as select）

2.2.4、創(chuàng)建表時通過location指定加載路徑

2.2.5、Import數(shù)據(jù)到指定Hive表中

2.3、Hive表的導出

2.3.1、insert導出

2.3.2、Hadoop命令導出到本地

2.3.3、Hive Shell命令導出

2.3.4、export導出到HDFS

猜你喜歡

最新資訊

相關推薦

相關標簽

13、Hive的DDL、DML語法案例實戰(zhàn)

1、Hive的DDL語法操作

1.1、Hive數(shù)據(jù)庫DDL操作

1.2.1、建表語法介紹

1.2.2、創(chuàng)建內部表

1.2.3、創(chuàng)建外部表

1.2.4、內部表與外部表相互轉換

1.3、Hive表DDL語法經典案列

1.3.3、家庭檔案數(shù)據(jù)建表案列

2、Hive的DML語法操作

2.1、修改表的結構

2.1.1、修改表的名稱

2.1.2、表的結構信息

2.1.3、增加/修改/替換列

2.1.4、增加/刪除/查看表的分區(qū)

2.2、Hive表的數(shù)據(jù)導入

2.2.3、查詢語句中創(chuàng)建并加載數(shù)據(jù)（as select）

2.2.5、Import數(shù)據(jù)到指定Hive表中

2.3、Hive表的導出

2.3.2、Hadoop命令導出到本地

2.3.3、Hive Shell命令導出