15、Hive函數(shù)詳解與案列實戰(zhàn)

發(fā)布時間：2020-02-14 11:09:52 來源：網(wǎng)絡(luò) 閱讀：921 作者：victor19901114 欄目：大數(shù)據(jù)

1、Hive系統(tǒng)內(nèi)置函數(shù)

1.1、數(shù)值計算函數(shù)

1、取整函數(shù): round

語法: round(double a)
返回值: BIGINT
說明: 返回double類型的整數(shù)值部分（遵循四舍五入）

hive> select round(3.1415926) from tableName;
3
hive> select round(3.5) from tableName;
4
hive> create table tableName as select round(9542.158) from tableName;

2、指定精度取整函數(shù): round

語法: round(double a, int d)
返回值: DOUBLE
說明: 返回指定精度d的double類型

hive> select round(3.1415926,4) from tableName;
3.1416

3、向下取整函數(shù): floor

語法: floor(double a)
返回值: BIGINT
說明: 返回等于或者小于該double變量的最大的整數(shù)

hive> select floor(3.1415926) from tableName;
3
hive> select floor(25) from tableName;
25

4、向上取整函數(shù): ceil

語法: ceil(double a)
返回值: BIGINT
說明: 返回等于或者大于該double變量的最小的整數(shù)

hive> select ceil(3.1415926) from tableName;
4
hive> select ceil(46) from tableName;
46

5、向上取整函數(shù): ceiling

語法: ceiling(double a)
返回值: BIGINT
說明: 與ceil功能相同

hive> select ceiling(3.1415926) from tableName;
4
hive> select ceiling(46) from tableName;
46

6、取隨機數(shù)函數(shù): rand

語法: rand(),rand(int seed)
返回值: double
說明: 返回一個0到1范圍內(nèi)的隨機數(shù)。如果指定種子seed，則會等到一個穩(wěn)定的隨機數(shù)序列

hive> select rand() from tableName;
0.5577432776034763
hive> select rand() from tableName;
0.6638336467363424
hive> select rand(100) from tableName;
0.7220096548596434
hive> select rand(100) from tableName;
0.7220096548596434

1.2、日期函數(shù)

1、UNIX時間戳轉(zhuǎn)日期函數(shù): from_unixtime

語法: from_unixtime(bigint unixtime[, string format])
返回值: string
說明: 轉(zhuǎn)化UNIX時間戳（從1970-01-01 00:00:00 UTC到指定時間的秒數(shù)）到當(dāng)前時區(qū)的時間格式

hive> select from_unixtime(1323308943,'yyyyMMdd') from tableName;
20111208

2、獲取當(dāng)前UNIX時間戳函數(shù): unix_timestamp

語法: unix_timestamp()
返回值: bigint
說明: 獲得當(dāng)前時區(qū)的UNIX時間戳

hive> select unix_timestamp() from tableName;
1323309615

3、日期轉(zhuǎn)UNIX時間戳函數(shù): unix_timestamp

語法: unix_timestamp(string date)
返回值: bigint
說明: 轉(zhuǎn)換格式為"yyyy-MM-dd HH:mm:ss"的日期到UNIX時間戳。如果轉(zhuǎn)化失敗，則返回0。

hive> select unix_timestamp('2011-12-07 13:01:03') from tableName;
1323234063

4、指定格式日期轉(zhuǎn)UNIX時間戳函數(shù): unix_timestamp

語法: unix_timestamp(string date, string pattern)
返回值: bigint
說明: 轉(zhuǎn)換pattern格式的日期到UNIX時間戳。如果轉(zhuǎn)化失敗，則返回0。

hive> select unix_timestamp('20111207 13:01:03','yyyyMMdd HH:mm:ss') from tableName;
1323234063

5、日期時間轉(zhuǎn)日期函數(shù): to_date

語法: to_date(string timestamp)
返回值: string
說明: 返回日期時間字段中的日期部分。

hive> select to_date('2011-12-08 10:03:01') from tableName;
2011-12-08

6、日期轉(zhuǎn)年函數(shù): year

語法: year(string date)
返回值: int
說明: 返回日期中的年。

hive> select year('2011-12-08 10:03:01') from tableName;
2011
hive> select year('2012-12-08') from tableName;
2012

7、日期轉(zhuǎn)月函數(shù): month

語法: month (string date)
返回值: int
說明: 返回日期中的月份。

hive> select month('2011-12-08 10:03:01') from tableName;
12
hive> select month('2011-08-08') from tableName;
8

8、日期轉(zhuǎn)天函數(shù): day

語法: day (string date)
返回值: int
說明: 返回日期中的天。

hive> select day('2011-12-08 10:03:01') from tableName;
8
hive> select day('2011-12-24') from tableName;
24

9、日期轉(zhuǎn)小時函數(shù): hour

語法: hour (string date)
返回值: int
說明: 返回日期中的小時。

hive> select hour('2011-12-08 10:03:01') from tableName;
10

10、日期轉(zhuǎn)分鐘函數(shù): minute

語法: minute (string date)
返回值: int
說明: 返回日期中的分鐘。

hive> select minute('2011-12-08 10:03:01') from tableName;
3

hive> select second('2011-12-08 10:03:01') from tableName;
1

12、日期轉(zhuǎn)周函數(shù): weekofyear

語法: weekofyear (string date)
返回值: int
說明: 返回日期在當(dāng)前的周數(shù)。

hive> select weekofyear('2011-12-08 10:03:01') from tableName;
49

13、日期比較函數(shù): datediff

語法: datediff(string enddate, string startdate)
返回值: int
說明: 返回結(jié)束日期減去開始日期的天數(shù)。

hive> select datediff('2012-12-08','2012-05-09') from tableName;
213

14、日期增加函數(shù): date_add

語法: date_add(string startdate, int days)
返回值: string
說明: 返回開始日期startdate增加days天后的日期。

hive> select date_add('2012-12-08',10) from tableName;
2012-12-18

15、日期減少函數(shù): date_sub

語法: date_sub (string startdate, int days)
返回值: string
說明: 返回開始日期startdate減少days天后的日期。

hive> select date_sub('2012-12-08',10) from tableName;
2012-11-28

1.3、條件函數(shù)

1、If函數(shù): if

語法: if(boolean testCondition, T valueTrue, T valueFalseOrNull)
返回值: T
說明: 當(dāng)條件testCondition為TRUE時，返回valueTrue；否則返回valueFalseOrNull

hive> select if(1=2,100,200) from tableName;
200
hive> select if(1=1,100,200) from tableName;
100

2、非空查找函數(shù): COALESCE

語法: COALESCE(T v1, T v2, …)
返回值: T
說明: 返回參數(shù)中的第一個非空值；如果所有值都為NULL，那么返回NULL

hive> select COALESCE(null,'100','50') from tableName;
100

3、條件判斷函數(shù)：CASE

語法: CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END
返回值: T
說明：如果a等于b，那么返回c；如果a等于d，那么返回e；否則返回f

hive> Select case 100 when 50 then 'tom' when 100 then 'mary' else 'tim' end from tableName;
mary
hive> Select case 200 when 50 then 'tom' when 100 then 'mary' else 'tim' end from tableName;
tim

4、條件判斷函數(shù)：CASE

語法: CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END
返回值: T
說明：如果a為TRUE,則返回b；如果c為TRUE，則返回d；否則返回e

hive> select case when 1=2 then 'tom' when 2=2 then 'mary' else 'tim' end from tableName;
mary
hive> select case when 1=1 then 'tom' when 2=2 then 'mary' else 'tim' end from tableName;
tom

1.4、字符串函數(shù)

1、字符串長度函數(shù)：length

語法: length(string A)
返回值: int
說明：返回字符串A的長度

hive> select length('abcedfg') from tableName;

2、字符串反轉(zhuǎn)函數(shù)：reverse

語法: reverse(string A)
返回值: string
說明：返回字符串A的反轉(zhuǎn)結(jié)果

hive> select reverse('abcedfg') from tableName;
gfdecba

3、字符串連接函數(shù)：concat

語法: concat(string A, string B…)
返回值: string
說明：返回輸入字符串連接后的結(jié)果，支持任意個輸入字符串

hive> select concat('abc','def','gh') from tableName;
abcdefgh

4、字符串連接并指定字符串分隔符：concat_ws

語法: concat_ws(string SEP, string A, string B…)
返回值: string
說明：返回輸入字符串連接后的結(jié)果，SEP表示各個字符串間的分隔符

hive> select concat_ws(',','abc','def','gh')from tableName;
abc,def,gh

5、字符串截取函數(shù)：substr

語法: substr(string A, int start),substring(string A, int start)
返回值: string
說明：返回字符串A從start位置到結(jié)尾的字符串

hive> select substr('abcde',3) from tableName;
cde
hive> select substring('abcde',3) from tableName;
cde
hive>  select substr('abcde',-1) from tableName;  （和ORACLE相同）
e

6、字符串截取函數(shù)：substr,substring

語法: substr(string A, int start, int len),substring(string A, int start, int len)
返回值: string
說明：返回字符串A從start位置開始，長度為len的字符串

hive> select substr('abcde',3,2) from tableName;
cd
hive> select substring('abcde',3,2) from tableName;
cd
hive>select substring('abcde',-2,2) from tableName;
de

7、字符串轉(zhuǎn)大寫函數(shù)：upper,ucase

語法: upper(string A) ucase(string A)
返回值: string
說明：返回字符串A的大寫格式

hive> select upper('abSEd') from tableName;
ABSED
hive> select ucase('abSEd') from tableName;
ABSED

8、字符串轉(zhuǎn)小寫函數(shù)：lower,lcase

語法: lower(string A) lcase(string A)
返回值: string
說明：返回字符串A的小寫格式

hive> select lower('abSEd') from tableName;
absed
hive> select lcase('abSEd') from tableName;
absed

9、去空格函數(shù)：trim

語法: trim(string A)
返回值: string
說明：去除字符串兩邊的空格

hive> select trim(' abc ') from tableName;
abc

10、url解析函數(shù) parse_url

語法:
parse_url(string urlString, string partToExtract [, string keyToExtract])
返回值: string
說明：返回URL中指定的部分。partToExtract的有效值為：HOST, PATH,
QUERY, REF, PROTOCOL, AUTHORITY, FILE, and USERINFO.

hive> select parse_url
('https://www.tableName.com/path2/p.php?k1=v1&k2=v2#Ref1', 'HOST') 
from tableName;
www.tableName.com 
hive> select parse_url
('https://www.tableName.com/path2/p.php?k1=v1&k2=v2#Ref1', 'QUERY', 'k1')
 from tableName;
v1

11、json解析 get_json_object

語法: get_json_object(string json_string, string path)
返回值: string
說明：解析json的字符串json_string,返回path指定的內(nèi)容。如果輸入的json字符串無效，那么返回NULL。

hive> select  get_json_object('{"store":{"fruit":\[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}], "bicycle":{"price":19.95,"color":"red"} },"email":"amy@only_for_json_udf_test.net","owner":"amy"}','$.owner') from tableName;

12、重復(fù)字符串函數(shù)：repeat

語法: repeat(string str, int n)
返回值: string
說明：返回重復(fù)n次后的str字符串

hive> select repeat('abc',5) from tableName;
abcabcabcabcabc

13、分割字符串函數(shù): split

語法: split(string str, string pat)
返回值: array
說明: 按照pat字符串分割str，會返回分割后的字符串?dāng)?shù)組

hive> select split('abtcdtef','t') from tableName;
["ab","cd","ef"]

1.5、集合統(tǒng)計函數(shù)

1、個數(shù)統(tǒng)計函數(shù): count

語法: count(*), count(expr), count(DISTINCT expr[, expr_.])
返回值：Int

說明: count(*)統(tǒng)計檢索出的行的個數(shù)，包括NULL值的行；count(expr)返回指定字段的非空值的個數(shù)；count(DISTINCT
expr[, expr_.])返回指定字段的不同的非空值的個數(shù)

hive> select count(*) from tableName;
20
hive> select count(distinct t) from tableName;
10

2、總和統(tǒng)計函數(shù): sum

語法: sum(col), sum(DISTINCT col)
返回值: double
說明: sum(col)統(tǒng)計結(jié)果集中col的相加的結(jié)果；sum(DISTINCT col)統(tǒng)計結(jié)果中col不同值相加的結(jié)果

hive> select sum(t) from tableName;
100
hive> select sum(distinct t) from tableName;
70

3、平均值統(tǒng)計函數(shù): avg

語法: avg(col), avg(DISTINCT col)
返回值: double
說明: avg(col)統(tǒng)計結(jié)果集中col的平均值；avg(DISTINCT col)統(tǒng)計結(jié)果中col不同值相加的平均值

hive> select avg(t) from tableName;
50
hive> select avg (distinct t) from tableName;
30

4、最小值統(tǒng)計函數(shù): min

語法: min(col)
返回值: double
說明: 統(tǒng)計結(jié)果集中col字段的最小值

hive> select min(t) from tableName;
20

5、最大值統(tǒng)計函數(shù): max

語法: maxcol)
返回值: double
說明: 統(tǒng)計結(jié)果集中col字段的最大值

hive> select max(t) from tableName;
120

1.6、復(fù)合型構(gòu)建函數(shù)

1、Map類型構(gòu)建: map

語法: map (key1, value1, key2, value2, …)
說明：根據(jù)輸入的key和value對構(gòu)建map類型

create table score_map(name string, score map<string,int>)
row format delimited fields terminated by '\t' 
collection items terminated by ',' map keys terminated by ':';

創(chuàng)建數(shù)據(jù)內(nèi)容如下并加載數(shù)據(jù)
cd /kkb/install/hivedatas/
vim score_map.txt

zhangsan    數(shù)學(xué):80,語文:89,英語:95
lisi    語文:60,數(shù)學(xué):80,英語:99

加載數(shù)據(jù)到hive表當(dāng)中去
load data local inpath '/kkb/install/hivedatas/score_map.txt' overwrite into table score_map;

map結(jié)構(gòu)數(shù)據(jù)訪問：
獲取所有的value：
select name,map_values(score) from score_map;

獲取所有的key：
select name,map_keys(score) from score_map;

按照key來進行獲取value值
select name,score["數(shù)學(xué)"]  from score_map;

查看map元素個數(shù)
select name,size(score) from score_map;

2、Struct類型構(gòu)建: struct

語法: struct(val1, val2, val3, …)
說明：根據(jù)輸入的參數(shù)構(gòu)建結(jié)構(gòu)體struct類型，似于C語言中的結(jié)構(gòu)體，內(nèi)部數(shù)據(jù)通過X.X來獲取，假設(shè)我們的數(shù)據(jù)格式是這樣的，電影ABC，有1254人評價過，打分為7.4分

創(chuàng)建struct表
hive> create table movie_score( name string,  info struct<number:int,score:float> )row format delimited fields terminated by "\t"  collection items terminated by ":"; 

加載數(shù)據(jù)
cd /kkb/install/hivedatas/
vim struct.txt

ABC 1254:7.4  
DEF 256:4.9  
XYZ 456:5.4

加載數(shù)據(jù)
load data local inpath '/kkb/install/hivedatas/struct.txt' overwrite into table movie_score;

hive當(dāng)中查詢數(shù)據(jù)
hive> select * from movie_score;  
hive> select info.number,info.score from movie_score;  
OK  
1254    7.4  
256     4.9  
456     5.4

3、array類型構(gòu)建: array

語法: array(val1, val2, …)
說明：根據(jù)輸入的參數(shù)構(gòu)建數(shù)組array類型

hive> create table  person(name string,work_locations array<string>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
COLLECTION ITEMS TERMINATED BY ',';

加載數(shù)據(jù)到person表當(dāng)中去
cd /kkb/install/hivedatas/
vim person.txt

數(shù)據(jù)內(nèi)容格式如下
biansutao   beijing,shanghai,tianjin,hangzhou
linan   changchu,chengdu,wuhan

加載數(shù)據(jù)
hive > load  data local inpath '/kkb/install/hivedatas/person.txt' overwrite into table person;

查詢所有數(shù)據(jù)數(shù)據(jù)
hive > select * from person;

按照下表索引進行查詢
hive > select work_locations[0] from person;

查詢所有集合數(shù)據(jù)
hive  > select work_locations from person; 

查詢元素個數(shù)
hive >  select size(work_locations) from person;

1.7、復(fù)雜型長度統(tǒng)計函數(shù)

1.Map類型長度函數(shù): size(Map<k .V>)

語法: size(Map<k .V>)
返回值: int
說明: 返回map類型的長度

hive> select size(t) from map_table2;
2

2.array類型長度函數(shù): size(Array<T>)

語法: size(Array<T>)
返回值: int
說明: 返回array類型的長度

hive> select size(t) from arr_table2;
4

3.類型轉(zhuǎn)換函數(shù)

類型轉(zhuǎn)換函數(shù): cast
語法: cast(expr as <type>)
返回值: Expected "=" to follow "type"
說明: 返回轉(zhuǎn)換后的數(shù)據(jù)類型

hive> select cast('1' as bigint) from tableName;
1

1.8、explode函數(shù)

1、使用explode函數(shù)將hive表中的Map和Array字段數(shù)據(jù)進行拆分

lateral view用于和split、explode等UDTF一起使用的，能將一行數(shù)據(jù)拆分成多行數(shù)據(jù)，在此基礎(chǔ)上可以對拆分的數(shù)據(jù)進行聚合，lateral view首先為原始表的每行調(diào)用UDTF，UDTF會把一行拆分成一行或者多行，lateral view在把結(jié)果組合，產(chǎn)生一個支持別名表的虛擬表。
其中explode還可以用于將hive一列中復(fù)雜的array或者map結(jié)構(gòu)拆分成多行

需求：現(xiàn)在有數(shù)據(jù)格式如下
zhangsan    child1,child2,child3,child4 k1:v1,k2:v2
lisi    child5,child6,child7,child8  k3:v3,k4:v4

字段之間使用\t分割，需求將所有的child進行拆開成為一列 
+----------+--+
| mychild  |
+----------+--+
| child1   |
| child2   |
| child3   |
| child4   |
| child5   |
| child6   |
| child7   |
| child8   |
+----------+--+

將map的key和value也進行拆開，成為如下結(jié)果

+-----------+-------------+--+
| mymapkey  | mymapvalue  |
+-----------+-------------+--+
| k1        | v1          |
| k2        | v2          |
| k3        | v3          |
| k4        | v4          |
+-----------+-------------+--+

第一步：創(chuàng)建hive數(shù)據(jù)庫

創(chuàng)建hive數(shù)據(jù)庫d

第一步：創(chuàng)建hive數(shù)據(jù)庫

創(chuàng)建hive數(shù)據(jù)庫d

hive (default)> create database hive_explode;
hive (default)> use hive_explode;

第二步：創(chuàng)建hive表，然后使用explode拆分map和array

create  table hive_explode.t3(name string,
children array<string>,
address Map<string,string>)
row format delimited fields terminated by '\t'  
collection items terminated by ','
map keys terminated by ':' 
stored as textFile;

第三步：加載數(shù)據(jù)

node03執(zhí)行以下命令創(chuàng)建表數(shù)據(jù)文件

cd  /kkb/install/hivedatas/

vim maparray
數(shù)據(jù)內(nèi)容格式如下

zhangsan    child1,child2,child3,child4 k1:v1,k2:v2
lisi    child5,child6,child7,child8 k3:v3,k4:v4

hive表當(dāng)中加載數(shù)據(jù)

hive (hive_explode)> load data local inpath '/kkb/install/hivedatas/maparray' into table hive_explode.t3;

第四步：使用explode將hive當(dāng)中數(shù)據(jù)拆開

將array當(dāng)中的數(shù)據(jù)拆分開

hive (hive_explode)> SELECT explode(children) AS myChild FROM hive_explode.t3;

將map當(dāng)中的數(shù)據(jù)拆分開

hive (hive_explode)> SELECT explode(address) AS (myMapKey, myMapValue) FROM hive_explode.t3;

2、使用explode拆分json字符串

需求：現(xiàn)在有一些數(shù)據(jù)格式如下：

a:shandong,b:beijing,c:hebei|1,2,3,4,5,6,7,8,9|[{"source":"7fresh","monthSales":4900,"userCount":1900,"score":"9.9"},{"source":"jd","monthSales":2090,"userCount":78981,"score":"9.8"},{"source":"jdmart","monthSales":6987,"userCount":1600,"score":"9.0"}]

其中字段與字段之間的分隔符是 |

我們要解析得到所有的monthSales對應(yīng)的值為以下這一列（行轉(zhuǎn)列）

4900
2090
6987

第一步：創(chuàng)建hive表

hive (hive_explode)> 
create table hive_explode.explode_lateral_view (
area string, 
goods_id string,
sale_info string) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' 
STORED AS textfile;

第二步：準(zhǔn)備數(shù)據(jù)并加載數(shù)據(jù)

準(zhǔn)備數(shù)據(jù)如下

cd /kkb/install/hivedatas
vim explode_json

a:shandong,b:beijing,c:hebei|1,2,3,4,5,6,7,8,9|[{"source":"7fresh","monthSales":4900,"userCount":1900,"score":"9.9"},{"source":"jd","monthSales":2090,"userCount":78981,"score":"9.8"},{"source":"jdmart","monthSales":6987,"userCount":1600,"score":"9.0"}]

加載數(shù)據(jù)到hive表當(dāng)中去

hive (hive_explode)> load data local inpath '/kkb/install/hivedatas/explode_json' overwrite into table hive_explode.explode_lateral_view;

第三步：使用explode拆分Array

hive (hive_explode)> select explode(split(goods_id,',')) as goods_id from hive_explode.explode_lateral_view;

第四步：使用explode拆解Map

hive (hive_explode)> select explode(split(area,',')) as area from hive_explode.explode_lateral_view;

第五步：拆解json字段

hive (hive_explode)> select explode(split(regexp_replace(regexp_replace(sale_info,'\\[\\{',''),'}]',''),'},\\{')) as  sale_info from hive_explode.explode_lateral_view;

然后我們想用get_json_object來獲取key為monthSales的數(shù)據(jù)：

hive (hive_explode)> select get_json_object(explode(split(regexp_replace(regexp_replace(sale_info,'\\[\\{',''),'}]',''),'},\\{')),'$.monthSales') as  sale_info from hive_explode.explode_lateral_view;
然后出現(xiàn)異常FAILED: SemanticException [Error 10081]: UDTF's are not supported outside the SELECT clause, nor nested in expressions
UDTF explode不能寫在別的函數(shù)內(nèi)
如果你這么寫，想查兩個字段，select explode(split(area,',')) as area,good_id from explode_lateral_view;
會報錯FAILED: SemanticException 1:40 Only a single expression in the SELECT clause is supported with UDTF's. Error encountered near token 'good_id'
使用UDTF的時候，只支持一個字段，這時候就需要LATERAL VIEW出場了

3、配合LATERAL VIEW使用

配合lateral view查詢多個字段

hive (hive_explode)> select goods_id2,sale_info from explode_lateral_view LATERAL VIEW explode(split(goods_id,','))goods as goods_id2;

其中LATERAL VIEW explode(split(goods_id,','))goods相當(dāng)于一個虛擬表，與原表explode_lateral_view笛卡爾積關(guān)聯(lián)。

也可以多重使用

hive (hive_explode)> select goods_id2,sale_info,area2 from explode_lateral_view  LATERAL VIEW explode(split(goods_id,','))goods as goods_id2 LATERAL VIEW explode(split(area,','))area as area2;

也是三個表笛卡爾積的結(jié)果

最終，我們可以通過下面的句子，把這個json格式的一行數(shù)據(jù)，完全轉(zhuǎn)換成二維表的方式展現(xiàn)

hive (hive_explode)> select get_json_object(concat('{',sale_info_1,'}'),'$.source') as source, get_json_object(concat('{',sale_info_1,'}'),'$.monthSales') as monthSales, get_json_object(concat('{',sale_info_1,'}'),'$.userCount') as monthSales,  get_json_object(concat('{',sale_info_1,'}'),'$.score') as monthSales from explode_lateral_view   LATERAL VIEW explode(split(regexp_replace(regexp_replace(sale_info,'\\[\\{',''),'}]',''),'},\\{'))sale_info as sale_info_1;

總結(jié)：

Lateral View通常和UDTF一起出現(xiàn)，為了解決UDTF不允許在select字段的問題。
Multiple Lateral View可以實現(xiàn)類似笛卡爾乘積。
Outer關(guān)鍵字可以把不輸出的UDTF的空結(jié)果，輸出成NULL，防止丟失數(shù)據(jù)。

1.9、列、行互轉(zhuǎn)函數(shù)

1.9.1、列轉(zhuǎn)行

1．相關(guān)函數(shù)說明

CONCAT(string A/col, string B/col…)：返回輸入字符串連接后的結(jié)果，支持任意個輸入字符串;

CONCAT_WS(separator, str1, str2,...)：它是一個特殊形式的 CONCAT()。第一個參數(shù)剩余參數(shù)間的分隔符。分隔符可以是與剩余參數(shù)一樣的字符串。如果分隔符是 NULL，返回值也將為 NULL。這個函數(shù)會跳過分隔符參數(shù)后的任何 NULL 和空字符串。分隔符將被加到被連接的字符串之間;

COLLECT_SET(col)：函數(shù)只接受基本數(shù)據(jù)類型，它的主要作用是將某字段的值進行去重匯總，產(chǎn)生array類型字段。

2．?dāng)?shù)據(jù)準(zhǔn)備

表6-6 數(shù)據(jù)準(zhǔn)備

name	constellation	blood_type
孫悟空	白羊座	A
老王	射手座	A
宋宋	白羊座	B
豬八戒	白羊座	A
冰冰	射手座	A

3．需求

把星座和血型一樣的人歸類到一起。結(jié)果如下：

射手座,A            老王|冰冰
白羊座,A            孫悟空|豬八戒
白羊座,B            宋宋

4．創(chuàng)建本地constellation.txt，導(dǎo)入數(shù)據(jù)

node03服務(wù)器執(zhí)行以下命令創(chuàng)建文件，注意數(shù)據(jù)使用\t進行分割

cd /kkb/install/hivedatas
vim constellation.txt

孫悟空 白羊座 A
老王  射手座 A
宋宋  白羊座 B       
豬八戒 白羊座 A
鳳姐  射手座 A

5．創(chuàng)建hive表并導(dǎo)入數(shù)據(jù)

創(chuàng)建hive表并加載數(shù)據(jù)

hive (hive_explode)> create table person_info(  name string,  constellation string,  blood_type string)  row format delimited fields terminated by "\t";

加載數(shù)據(jù)

hive (hive_explode)> load data local inpath '/kkb/install/hivedatas/constellation.txt' into table person_info;

6．按需求查詢數(shù)據(jù)

hive (hive_explode)> select t1.base, concat_ws('|', collect_set(t1.name)) name from    (select name, concat(constellation, "," , blood_type) base from person_info) t1 group by  t1.base;

1.9.2、行轉(zhuǎn)列

1．函數(shù)說明

EXPLODE(col)：將hive一列中復(fù)雜的array或者map結(jié)構(gòu)拆分成多行。

LATERAL VIEW

用法：LATERAL VIEW udtf(expression) tableAlias AS columnAlias

解釋：用于和split, explode等UDTF一起使用，它能夠?qū)⒁涣袛?shù)據(jù)拆成多行數(shù)據(jù)，在此基礎(chǔ)上可以對拆分后的數(shù)據(jù)進行聚合。

2．?dāng)?shù)據(jù)準(zhǔn)備

數(shù)據(jù)內(nèi)容如下，字段之間都是使用\t進行分割

cd /kkb/install/hivedatas

vim movie.txt
《疑犯追蹤》  懸疑,動作,科幻,劇情
《Lie to me》 懸疑,警匪,動作,心理,劇情
《戰(zhàn)狼2》   戰(zhàn)爭,動作,災(zāi)難

3．需求

將電影分類中的數(shù)組數(shù)據(jù)展開。結(jié)果如下：

《疑犯追蹤》  懸疑
《疑犯追蹤》  動作
《疑犯追蹤》  科幻
《疑犯追蹤》  劇情
《Lie to me》 懸疑
《Lie to me》 警匪
《Lie to me》 動作
《Lie to me》 心理
《Lie to me》 劇情
《戰(zhàn)狼2》   戰(zhàn)爭
《戰(zhàn)狼2》   動作
《戰(zhàn)狼2》   災(zāi)難

4．創(chuàng)建hive表并導(dǎo)入數(shù)據(jù)

創(chuàng)建hive表

hive (hive_explode)> create table movie_info(
movie string, 
category array<string>
) 
row format delimited fields terminated by "\t" 
collection items terminated by ",";

加載數(shù)據(jù)

load data local inpath "/kkb/install/hivedatas/movie.txt" into table movie_info;

5．按需求查詢數(shù)據(jù)

hive (hive_explode)>  
select movie, category_name 
from 
movie_info lateral view explode(category) table_tmp as category_name;

1.10、reflect函數(shù)

reflect函數(shù)可以支持在sql中調(diào)用java中的自帶函數(shù)

使用java.lang.Math當(dāng)中的Max求兩列中最大值

創(chuàng)建hive表

hive (hive_explode)>  
create table test_udf(col1 int,col2 int)
row format delimited fields terminated by ',';

準(zhǔn)備數(shù)據(jù)并加載數(shù)據(jù)

cd /kkb/install/hivedatas

vim test_udf

1,2
4,3
6,4
7,5
5,6

加載數(shù)據(jù)

hive (hive_explode)> load data local inpath '/kkb/install/hivedatas/test_udf' overwrite into table test_udf;

使用java.lang.Math當(dāng)中的Max求兩列當(dāng)中的最大值

hive (hive_explode)> select reflect("java.lang.Math","max",col1,col2) from test_udf;

不同記錄執(zhí)行不同的java內(nèi)置函數(shù)

創(chuàng)建hive表

hive (hive_explode)> create table test_udf2(class_name string,method_name string,col1 int , col2 int) row format delimited fields terminated by ',';

準(zhǔn)備數(shù)據(jù)

cd /export/servers/hivedatas

vim test_udf2

java.lang.Math,min,1,2
java.lang.Math,max,2,3

加載數(shù)據(jù)

hive (hive_explode)> load data local inpath '/kkb/install/hivedatas/test_udf2' overwrite into table test_udf2;

執(zhí)行查詢

hive (hive_explode)> select reflect(class_name,method_name,col1,col2) from test_udf2;

判斷是否為數(shù)字

使用apache commons中的函數(shù)，commons下的jar已經(jīng)包含在hadoop的classpath中，所以可以直接使用。

使用方式如下：

hive (hive_explode)> select reflect("org.apache.commons.lang.math.NumberUtils","isNumber","123");

1.11、分析函數(shù)

1、分析函數(shù)的作用介紹

對于一些比較復(fù)雜的數(shù)據(jù)求取過程，我們可能就要用到分析函數(shù)，分析函數(shù)主要用于分組求topN，或者求取百分比，或者進行數(shù)據(jù)的切片等等，我們都可以使用分析函數(shù)來解決

2、常用的分析函數(shù)介紹

1、ROW_NUMBER()：

從1開始，按照順序，生成分組內(nèi)記錄的序列,比如，按照pv降序排列，生成分組內(nèi)每天的pv名次,ROW_NUMBER()的應(yīng)用場景非常多，再比如，獲取分組內(nèi)排序第一的記錄;獲取一個session中的第一條refer等。

2、RANK() ：

生成數(shù)據(jù)項在分組中的排名，排名相等會在名次中留下空位

3、DENSE_RANK() ：

生成數(shù)據(jù)項在分組中的排名，排名相等會在名次中不會留下空位

4、CUME_DIST ：

小于等于當(dāng)前值的行數(shù)/分組內(nèi)總行數(shù)。比如，統(tǒng)計小于等于當(dāng)前薪水的人數(shù)，所占總?cè)藬?shù)的比例

5、PERCENT_RANK ：

分組內(nèi)當(dāng)前行的RANK值/分組內(nèi)總行數(shù)

6、NTILE(n) ：

用于將分組數(shù)據(jù)按照順序切分成n片，返回當(dāng)前切片值，如果切片不均勻，默認(rèn)增加第一個切片的分布。NTILE不支持ROWS BETWEEN，比如 NTILE(2) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW)。

3、需求描述

現(xiàn)有數(shù)據(jù)內(nèi)容格式如下，分別對應(yīng)三個字段，cookieid，createtime ，pv，求取每個cookie訪問pv前三名的數(shù)據(jù)記錄，其實就是分組求topN，求取每組當(dāng)中的前三個值

cookie1,2015-04-10,1
cookie1,2015-04-11,5
cookie1,2015-04-12,7
cookie1,2015-04-13,3
cookie1,2015-04-14,2
cookie1,2015-04-15,4
cookie1,2015-04-16,4
cookie2,2015-04-10,2
cookie2,2015-04-11,3
cookie2,2015-04-12,5
cookie2,2015-04-13,6
cookie2,2015-04-14,3
cookie2,2015-04-15,9
cookie2,2015-04-16,7

第一步：創(chuàng)建數(shù)據(jù)庫表

在hive當(dāng)中創(chuàng)建數(shù)據(jù)庫表

CREATE EXTERNAL TABLE cookie_pv (
cookieid string,
createtime string, 
pv INT
) ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',' ;

第二步：準(zhǔn)備數(shù)據(jù)并加載

node03執(zhí)行以下命令，創(chuàng)建數(shù)據(jù)，并加載到hive表當(dāng)中去

cd /kkb/install/hivedatas
vim cookiepv.txt

cookie1,2015-04-10,1
cookie1,2015-04-11,5
cookie1,2015-04-12,7
cookie1,2015-04-13,3
cookie1,2015-04-14,2
cookie1,2015-04-15,4
cookie1,2015-04-16,4
cookie2,2015-04-10,2
cookie2,2015-04-11,3
cookie2,2015-04-12,5
cookie2,2015-04-13,6
cookie2,2015-04-14,3
cookie2,2015-04-15,9
cookie2,2015-04-16,7

加載數(shù)據(jù)到hive表當(dāng)中去

load  data  local inpath '/kkb/install/hivedatas/cookiepv.txt'  overwrite into table  cookie_pv

第三步：使用分析函數(shù)來求取每個cookie訪問PV的前三條記錄

SELECT 
cookieid,
createtime,
pv,
RANK() OVER(PARTITION BY cookieid ORDER BY pv desc) AS rn1,
DENSE_RANK() OVER(PARTITION BY cookieid ORDER BY pv desc) AS rn2,
ROW_NUMBER() OVER(PARTITION BY cookieid ORDER BY pv DESC) AS rn3 
FROM cookie_pv 
WHERE rn1 <=  3 ;

2、Hive自定義函數(shù)

2.1、自定義函數(shù)的基本介紹

1）Hive 自帶了一些函數(shù)，比如：max/min等，但是數(shù)量有限，自己可以通過自定義UDF來方便的擴展。

2）當(dāng)Hive提供的內(nèi)置函數(shù)無法滿足你的業(yè)務(wù)處理需要時，此時就可以考慮使用用戶自定義函數(shù)（UDF：user-defined function）。

3）根據(jù)用戶自定義函數(shù)類別分為以下三種：

? （1）UDF（User-Defined-Function）

? 一進一出

? （2）UDAF（User-Defined Aggregation Function）

? 聚集函數(shù)，多進一出

? 類似于：count/max/min

? （3）UDTF（User-Defined Table-Generating Functions）

? 一進多出

? 如lateral view explode()

4）官方文檔地址

https://cwiki.apache.org/confluence/display/Hive/HivePlugins

5）編程步驟：

? （1）繼承org.apache.hadoop.hive.ql.UDF

? （2）需要實現(xiàn)evaluate函數(shù)；evaluate函數(shù)支持重載；

6）注意事項

? （1）UDF必須要有返回類型，可以返回null，但是返回類型不能為void；

? （2）UDF中常用Text/LongWritable等類型，不推薦使用java類型；

2.2、自定義函數(shù)開發(fā)

1、自定義函數(shù)的基本介紹

1）Hive 自帶了一些函數(shù)，比如：max/min等，但是數(shù)量有限，自己可以通過自定義UDF來方便的擴展。

2）當(dāng)Hive提供的內(nèi)置函數(shù)無法滿足你的業(yè)務(wù)處理需要時，此時就可以考慮使用用戶自定義函數(shù)（UDF：user-defined function）。

3）根據(jù)用戶自定義函數(shù)類別分為以下三種：

? （1）UDF（User-Defined-Function）

? 一進一出

? （2）UDAF（User-Defined Aggregation Function）

? 聚集函數(shù)，多進一出

? 類似于：count/max/min

? （3）UDTF（User-Defined Table-Generating Functions）

? 一進多出

? 如lateral view explode()

4）官方文檔地址

https://cwiki.apache.org/confluence/display/Hive/HivePlugins

5）編程步驟：

? （1）繼承org.apache.hadoop.hive.ql.UDF

? （2）需要實現(xiàn)evaluate函數(shù)；evaluate函數(shù)支持重載；

6）注意事項

? （1）UDF必須要有返回類型，可以返回null，但是返回類型不能為void；

? （2）UDF中常用Text/LongWritable等類型，不推薦使用java類型；

2、自定義函數(shù)開發(fā)

第一步：創(chuàng)建maven java 工程，并導(dǎo)入jar包

<repositories>
??? <repository>
??????? <id>cloudera</id>
?<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
??? </repository>
</repositories>
<dependencies>
??? <dependency>
??????? <groupId>org.apache.hadoop</groupId>
??????? <artifactId>hadoop-common</artifactId>
??????? <version>2.6.0-cdh6.14.2</version>
??? </dependency>
??? <dependency>
??????? <groupId>org.apache.hive</groupId>
??????? <artifactId>hive-exec</artifactId>
??????? <version>1.1.0-cdh6.14.2</version>
??? </dependency>
</dependencies>
<build>
<plugins>
??? <plugin>
??????? <groupId>org.apache.maven.plugins</groupId>
??????? <artifactId>maven-compiler-plugin</artifactId>
??????? <version>3.0</version>
??????? <configuration>
??????????? <source>1.8</source>
??????????? <target>1.8</target>
??????????? <encoding>UTF-8</encoding>
??????? </configuration>
??? </plugin>
???? <plugin>
???????? <groupId>org.apache.maven.plugins</groupId>
???????? <artifactId>maven-shade-plugin</artifactId>
???????? <version>2.2</version>
???????? <executions>
???????????? <execution>
???????????????? <phase>package</phase>
???????????????? <goals>
???????????????????? <goal>shade</goal>
???????????????? </goals>
???????????????? <configuration>
???????????????????? <filters>
???????????????????????? <filter>
???????????????????????????? <artifact>*:*</artifact>
???????????????????????????? <excludes>
???????????????????????????????? <exclude>META-INF/*.SF</exclude>
???????????????????????????????? <exclude>META-INF/*.DSA</exclude>
???????????????????????????????? <exclude>META-INF/*/RSA</exclude>
???????????????????????????? </excludes>
???????????????????????? </filter>
????????????? ???????</filters>
???????????????? </configuration>
???????????? </execution>
???????? </executions>
???? </plugin>
</plugins>
</build>

第二步：開發(fā)java類繼承UDF，并重載evaluate 方法

public class MyUDF extends UDF {
     public Text evaluate(final Text s) {
         if (null == s) {
             return null;
         }
         //**返回大寫字母         
         return new Text(s.toString().toUpperCase());
     }
 }

第三步：將我們的項目打包，并上傳到hive的lib目錄下

使用maven的package進行打包，將我們打包好的jar包上傳到node03服務(wù)器的/kkb/install/hive-1.1.0-cdh6.14.2/lib 這個路徑下

第四步：添加我們的jar包

重命名我們的jar包名稱

cd /kkb/install/hive-1.1.0-cdh6.14.2/lib
mv original-day_hive_udf-1.0-SNAPSHOT.jar udf.jar

hive的客戶端添加我們的jar包

0: jdbc:hive2://node03:10000> add jar /kkb/install/hive-1.1.0-cdh6.14.2/lib/udf.jar;

第五步：設(shè)置函數(shù)與我們的自定義函數(shù)關(guān)聯(lián)

0: jdbc:hive2://node03:10000> create temporary function tolowercase as 'com.kkb.udf.MyUDF';

第六步：使用自定義函數(shù)

0: jdbc:hive2://node03:10000>select tolowercase('abc');

hive當(dāng)中如何創(chuàng)建永久函數(shù)

在hive當(dāng)中添加臨時函數(shù)，需要我們每次進入hive客戶端的時候都需要添加以下，退出hive客戶端臨時函數(shù)就會失效，那么我們也可以創(chuàng)建永久函數(shù)來讓其不會失效

創(chuàng)建永久函數(shù)

1、指定數(shù)據(jù)庫，將我們的函數(shù)創(chuàng)建到指定的數(shù)據(jù)庫下面
0: jdbc:hive2://node03:10000>use myhive;

2、使用add jar添加我們的jar包到hive當(dāng)中來
0: jdbc:hive2://node03:10000>add jar /kkb/install/hive-1.1.0-cdh6.14.2/lib/udf.jar;

3、查看我們添加的所有的jar包
0: jdbc:hive2://node03:10000>list  jars;

4、創(chuàng)建永久函數(shù)，與我們的函數(shù)進行關(guān)聯(lián)
0: jdbc:hive2://node03:10000>create  function myuppercase as 'com.kkb.udf.MyUDF';

5、查看我們的永久函數(shù)
0: jdbc:hive2://node03:10000>show functions like 'my*';

6、使用永久函數(shù)
0: jdbc:hive2://node03:10000>select myhive.myuppercase('helloworld');

7、刪除永久函數(shù)
0: jdbc:hive2://node03:10000>drop function myhive.myuppercase;

8、查看函數(shù)
 show functions like 'my*';

向AI問一下細(xì)節(jié)

15、Hive函數(shù)詳解與案列實戰(zhàn)

1、Hive系統(tǒng)內(nèi)置函數(shù)

1.1、數(shù)值計算函數(shù)

1、取整函數(shù): round

2、指定精度取整函數(shù): round

3、向下取整函數(shù): floor

4、向上取整函數(shù): ceil

5、向上取整函數(shù): ceiling

6、取隨機數(shù)函數(shù): rand

1.2、日期函數(shù)

1、UNIX時間戳轉(zhuǎn)日期函數(shù): from_unixtime

2、獲取當(dāng)前UNIX時間戳函數(shù): unix_timestamp

3、日期轉(zhuǎn)UNIX時間戳函數(shù): unix_timestamp

4、指定格式日期轉(zhuǎn)UNIX時間戳函數(shù): unix_timestamp

5、日期時間轉(zhuǎn)日期函數(shù): to_date

6、日期轉(zhuǎn)年函數(shù): year

7、日期轉(zhuǎn)月函數(shù): month

8、日期轉(zhuǎn)天函數(shù): day

9、日期轉(zhuǎn)小時函數(shù): hour

10、日期轉(zhuǎn)分鐘函數(shù): minute

12、日期轉(zhuǎn)周函數(shù): weekofyear

13、日期比較函數(shù): datediff

14、日期增加函數(shù): date_add

15、日期減少函數(shù): date_sub

1.3、條件函數(shù)

1、If函數(shù): if

2、非空查找函數(shù): COALESCE

3、條件判斷函數(shù)：CASE

4、條件判斷函數(shù)：CASE

1.4、字符串函數(shù)

1、字符串長度函數(shù)：length

2、字符串反轉(zhuǎn)函數(shù)：reverse

3、字符串連接函數(shù)：concat

4、字符串連接并指定字符串分隔符：concat_ws

5、字符串截取函數(shù)：substr

6、字符串截取函數(shù)：substr,substring

7、字符串轉(zhuǎn)大寫函數(shù)：upper,ucase

8、字符串轉(zhuǎn)小寫函數(shù)：lower,lcase

9、去空格函數(shù)：trim

10、url解析函數(shù) parse_url

11、json解析 get_json_object

12、重復(fù)字符串函數(shù)：repeat

13、分割字符串函數(shù): split

1.5、集合統(tǒng)計函數(shù)

1、個數(shù)統(tǒng)計函數(shù): count

2、總和統(tǒng)計函數(shù): sum

3、平均值統(tǒng)計函數(shù): avg

4、最小值統(tǒng)計函數(shù): min

5、最大值統(tǒng)計函數(shù): max

1.6、復(fù)合型構(gòu)建函數(shù)

1、Map類型構(gòu)建: map

2、Struct類型構(gòu)建: struct

3、array類型構(gòu)建: array

1.7、復(fù)雜型長度統(tǒng)計函數(shù)

1.Map類型長度函數(shù): size(Map<k .V>)

2.array類型長度函數(shù): size(Array<T>)

3.類型轉(zhuǎn)換函數(shù)

1.8、explode函數(shù)

1、使用explode函數(shù)將hive表中的Map和Array字段數(shù)據(jù)進行拆分

2、使用explode拆分json字符串

3、配合LATERAL VIEW使用

1.9、列、行互轉(zhuǎn)函數(shù)

1.9.1、列轉(zhuǎn)行

1．相關(guān)函數(shù)說明

2．?dāng)?shù)據(jù)準(zhǔn)備

3．需求

4．創(chuàng)建本地constellation.txt，導(dǎo)入數(shù)據(jù)

5．創(chuàng)建hive表并導(dǎo)入數(shù)據(jù)

6．按需求查詢數(shù)據(jù)

1．函數(shù)說明

2．?dāng)?shù)據(jù)準(zhǔn)備

2．?dāng)?shù)據(jù)準(zhǔn)備

3．需求

4．創(chuàng)建hive表并導(dǎo)入數(shù)據(jù)

5．按需求查詢數(shù)據(jù)

1.10、reflect函數(shù)

使用java.lang.Math當(dāng)中的Max求兩列中最大值

不同記錄執(zhí)行不同的java內(nèi)置函數(shù)

判斷是否為數(shù)字

1.11、分析函數(shù)

1.1、數(shù)值計算函數(shù)

1、取整函數(shù): round

2、指定精度取整函數(shù): round

3、向下取整函數(shù): floor

4、向上取整函數(shù): ceil

5、向上取整函數(shù): ceiling

1.2、日期函數(shù)

1、UNIX時間戳轉(zhuǎn)日期函數(shù): from_unixtime

2、獲取當(dāng)前UNIX時間戳函數(shù): unix_timestamp

3、日期轉(zhuǎn)UNIX時間戳函數(shù): unix_timestamp

4、指定格式日期轉(zhuǎn)UNIX時間戳函數(shù): unix_timestamp

6、日期轉(zhuǎn)年函數(shù): year

7、日期轉(zhuǎn)月函數(shù): month

8、日期轉(zhuǎn)天函數(shù): day

9、日期轉(zhuǎn)小時函數(shù): hour

10、日期轉(zhuǎn)分鐘函數(shù): minute

12、日期轉(zhuǎn)周函數(shù): weekofyear

13、日期比較函數(shù): datediff

14、日期增加函數(shù): date_add

15、日期減少函數(shù): date_sub

1.3、條件函數(shù)

2、非空查找函數(shù): COALESCE

3、條件判斷函數(shù)：CASE

4、條件判斷函數(shù)：CASE

1.4、字符串函數(shù)

1、字符串長度函數(shù)：length

2、字符串反轉(zhuǎn)函數(shù)：reverse

3、字符串連接函數(shù)：concat

4、字符串連接并指定字符串分隔符：concat_ws

5、字符串截取函數(shù)：substr

6、字符串截取函數(shù)：substr,substring

7、字符串轉(zhuǎn)大寫函數(shù)：upper,ucase

8、字符串轉(zhuǎn)小寫函數(shù)：lower,lcase

9、去空格函數(shù)：trim

11、json解析 get_json_object

12、重復(fù)字符串函數(shù)：repeat

13、分割字符串函數(shù): split

1.5、集合統(tǒng)計函數(shù)

1、個數(shù)統(tǒng)計函數(shù): count

3、平均值統(tǒng)計函數(shù): avg

4、最小值統(tǒng)計函數(shù): min

5、最大值統(tǒng)計函數(shù): max

1.6、復(fù)合型構(gòu)建函數(shù)

1、Map類型構(gòu)建: map

2、Struct類型構(gòu)建: struct

3、array類型構(gòu)建: array

1.7、復(fù)雜型長度統(tǒng)計函數(shù)

1.8、explode函數(shù)

1、使用explode函數(shù)將hive表中的Map和Array字段數(shù)據(jù)進行拆分

2、使用explode拆分json字符串

1.9、列、行互轉(zhuǎn)函數(shù)

4．創(chuàng)建本地constellation.txt，導(dǎo)入數(shù)據(jù)

1.10、reflect函數(shù)

1.11、分析函數(shù)

1、分析函數(shù)的作用介紹

2、常用的分析函數(shù)介紹

3、需求描述

2、Hive自定義函數(shù)

2.1、自定義函數(shù)的基本介紹

1、自定義函數(shù)的基本介紹

2、自定義函數(shù)開發(fā)