hive內置函數怎么用

發(fā)布時間：2021-12-10 11:36:02 來源：億速云閱讀：122 作者：小新欄目：大數據

小編給大家分享一下hive內置函數怎么用，相信大部分人都還不怎么了解，因此分享這篇文章給大家參考一下，希望大家閱讀完這篇文章后大有收獲，下面讓我們一起去了解一下吧！

cli命令

show functions;

desc function concat;

desc function extended concat;查看某個函數怎么使用的例子

nvl函數
coalesce(v1,v2,...)返回參數中第一個非空值，如果所有值都為null返回null；

set.cli.print.header=true;

winfunc

員工工資標識

id  money type

關系型運算符優(yōu)先級高到低為：not and or
and or 優(yōu)先級

select id ,money from winfunc where id='1001' or id='1002'
     and money ='100';

結果

    1001  100
    1001  150
    1001  200
    1001  150
    1002  100

正確的sql應該是

select id ,money from winfunc where (id='1001' or id='1002') and money ='100';

結果

     1001  100
     1002  100

if(con,v1,v2)

    select if(2>1,'v1','v2') from dual;
    v1

case when

select case when id='1001' then 'v1' when id='1002' then 'v2' else 'v3' end from winfunc;

get_json_object

select get_json_object('{"name":"jack","age":"20"}','$.name') from dual;
jack

parse_url

select parse_url('http://facebook.com/path2/p.php?k1=v1&k2=v2#Ref1', 'HOST') from
lxw_dual;
facebook.com

select parse_url('http://facebook.com/path2/p.php?k1=v1&k2=v2#Ref1', 'QUERY', 'k1')
    from lxw_dual;
    v1

concat_ws比concat多了個拼接字符串之間的分隔符

concat_ws(string SEP,array<string>)對數組里的值處理

collect_set(id)去重返回數組


    select collect_set(id) from winfunc;
    ["1001","1002","1003","1004"]

collect_list(id)不去重返回數組

    select collect_list(id) from winfunc;

partition by關鍵字是oracle中分析性函數的一部分，它和聚合函數不同的地方在于它能返回一個分組中的多條記錄，而聚合函數一般只有一條反映統(tǒng)計值的記錄

sum() over (PARTITION BY ...) 是一個分析函數。他執(zhí)行的效果跟普通的sum ...group by ...不一樣，它計算組中表達式的累積和，而不是簡單的和。

Group By 和 Having, Where ,Order by這些關鍵字是按照如下順序進行執(zhí)行的：Where, Group By, Having, Order by。

在這四個關鍵字中，只有在Order By語句中才可以使用最終視圖的列名，如：

SELECT FruitName, ProductPlace, Price, ID AS IDE, Discount
FROM T_TEST_FRUITINFO
WHERE (ProductPlace = N'china')
ORDER BY IDE
這里只有在ORDER BY語句中才可以使用IDE，其他條件語句中如果需要引用列名則只能使用ID，而不能使用IDE。

ORDER BY 子句中的列必須包含在聚合函數或 GROUP BY 子句中。

GROUP BY 和 ORDER BY一起使用時，ORDER BY要在GROUP BY的后面。

一、窗口函數

first_value(求組的第一個值)

    select id,money,
    first_value(money) over (partition by id order by money
    rows between 1 preceding and 1 following)
    from winfunc

每行對應的數據窗口是從第一行到最后一行
    rows between unbounded preceding and unbounded following

lead(money,2) 取后面距離為2的記錄值，沒有就取null

    select id,money,lead(money,2) over(order by money) from winfunc

lag(money,2)于lead相反

rank()排序函數與row_number()

select id,money, rank() over (partition by id order by money) from winfunc
結果

    1001 100 1
    1001 150 2
    1001 150 2
    1001 200 4

dense_rank()

select id,money, dense_rank() over (partition by id order by money) from winfunc

結果

    1001 100 1
    1001 150 2
    1001 150 2
    1001 200 3

cume_dist()

計算公式：CUME_DIST 小于等于當前值的行數/分組內總行數–比如，統(tǒng)計小于等于當前薪水的人數，所占總人數的比例

    select id,money, cume_dist() over (partition by id order by money) from winfunc

結果

    1001 100 0.25
    1001 150 0.75
    1001 150 0.75
    1001 200 1

percent_rank()，第一個總是從零開始
PERCENT_RANK() = (RANK() – 1) / (Total Rows – 1)

計算公式：(相同值最小行號-1)/(總行數-1)

結果

    1001 100 0
    1001 150 0.33
    1001 150 0.33
    1001 200 1
ntile(2) 分片

asc時， nulls last為默認
desc時， nulls first為默認

select id,money, ntile(2) over (order by money desc nulls last) from winfunc;

混合函數（使用java里面的方法）

java_method和reflect是一樣的

select java_method("java.lang.Math","sqrt",cast(id as double)) from winfunc;

UDTF表函數explode()配合lateral view關鍵字

select id ,adid from winfunc lateral view explode(split(type,'B')) tt as adid

1001 ABC

列轉行

1001 A

1001 C

正則表達式函數

like 字符"_"表示任意單個字符，而字符"%"表示任意數量的字符

rlike后面跟正則表達式

select 1 from dual where 'footbar' rlike  '^f.*r$';

正則表達式替換函數

regexp_replace(string A,string B,string C)
將字符串A中符合java正則表達式B的部分替換為C

select regexp_replace('foobar','oo|ar','') from dual;

返回fb

regexp_extract(string subject,string pattern,int index)

select regexp_extract('foothebar','foo(.*?)(bar)',1) from dual;

返回the，()正則表達式中表示組，1表示第一個組的索引

1.貪婪匹配(.*), |一直匹配到最后一個|

    select regexp_extract('979|7.10.80|8684','.*\\|(.*)',1) from dual;

返回8684

2.非貪婪匹配(.*?)加個問號告訴正則引擎，盡可能少的重復上一個字符

    select regexp_extract('979|7.10.80|8684','(.*?)\\|(.*)',1) from dual;

以上是“hive內置函數怎么用”這篇文章的所有內容，感謝各位的閱讀！相信大家都有了一定的了解，希望分享的內容對大家有所幫助，如果還想學習更多知識，歡迎關注億速云行業(yè)資訊頻道！

向AI問一下細節(jié)

hive內置函數怎么用

猜你喜歡

最新資訊

相關推薦

相關標簽