溫馨提示×

您好,登錄后才能下訂單哦!

密碼登錄×
登錄注冊(cè)×
其他方式登錄
點(diǎn)擊 登錄注冊(cè) 即表示同意《億速云用戶服務(wù)條款》

Hive筆記整理(三)

發(fā)布時(shí)間:2020-06-23 14:05:41 來(lái)源:網(wǎng)絡(luò) 閱讀:8397 作者:xpleaf 欄目:大數(shù)據(jù)

[TOC]


Hive筆記整理(三)

Hive的函數(shù)

Hive函數(shù)分類

函數(shù)的定義和java、mysql一樣,有三種。

UDF(User Definition Function 用戶定義函數(shù))
一路輸入,一路輸出
sin(30°)=1/2
UDAF(User Definition Aggregation Function 聚合函數(shù))
多路輸入,一路輸出
max min count sum avg等等
UDTF(User Definition Table Function 表函數(shù))
一路輸入,多路輸出
explode

常用函數(shù)

show functions;             列出hive中可用的函數(shù)列表
desc function func_name;    查看函數(shù)的幫助說(shuō)明

case when   ---->switch或if else
if          ---->三元運(yùn)算符
explode     ---->將數(shù)組中的元素轉(zhuǎn)換成多行數(shù)據(jù)
a = [1, 2, 3, 4] explode(a) ===>
        1
        2
        3
        4
split       ---->就是字符串中的split函數(shù)

array       ---->
collect_set
collect_list
concat_ws   ---->使用給定的字符串來(lái)連接元素
--------------
row_number  ---->分組排序或者二次排序

函數(shù)案例

wordcount
分析:
    hello   you
    hello   me
    hello   he
使用mr的的過(guò)程
    step1----->split("\t")--->
        ["hello", "you"]
        ["hello", "me"]
        ["hello", "he"]
    step2----->遍歷每一個(gè)數(shù)組,將數(shù)組中的每一個(gè)值,作為key,value為1寫出去<key, 1>
        <"hello", 1>
        <"you", 1>
        <"hello", 1>
        <"me", 1>
        <"hello", 1>
        <"he", 1>

    step3,shuffle--->
        <"hello", [1, 1, 1]>
        <"you", 1>
        <"me", 1>
        <"he", 1>
    step 4, reduce ====>reduceByKey
使用hql
    step 1 (mydb1)> select split(line, "\t") from test;
            ["hello","you"]
            ["hello","he"]
            ["hello","me"]
    step 2 將數(shù)組中的每一行數(shù)據(jù)轉(zhuǎn)化為多行
            (mydb1)> select explode(split(line, "\t"))  from test;
                hello
                you
                hello
                he
                hello
                me
    step 3 在step2的基礎(chǔ)之上進(jìn)行g(shù)roup by 即可
        select
            w.word, count(w.word) as count 
        from (select explode(split(line, "\t")) word  from test) w
        group by w.word order by count desc;
case when

case when將一下對(duì)應(yīng)的部門名稱顯示出來(lái):

1--->學(xué)工組,2--->行政組,3---->銷售組,4---->研發(fā)組,5---->其它
hive (mydb1)> select * from t1;
1
2
3
4
5
select
  id,
case id
  when 1 then "學(xué)工組"
  when 2 then "行政組"
  when 3 then "銷售組"
  when 4 then "研發(fā)組"
  else "行政組"
end
from t1;    
分類顯示
1   學(xué)工組
2   行政組
3   銷售組
4   研發(fā)組
5   其它
row_number 二次排序
三種連接
    交叉連接
        across join,會(huì)有笛卡爾積,所以不用
    內(nèi)連接(等值連接)
        inner join
        將左表和右表中能夠匹配的上的數(shù)據(jù)做輸出
    外鏈接
        outer join
        左外連接(left outer join)

        右外鏈接(right outer join)

根據(jù)員工、部分、薪資,這三張表,
    1、分組顯示每一個(gè)部分員工的信息(啟動(dòng)顯示部分名稱,員工姓名,員工性別[男|女],員工薪資),同時(shí)分組按照員工薪資降序排序
        select
           e.name, if(sex == 0, '女', '男') as gender, d.name, s.salary,
           row_number() over(partition by e.deptid order by s.salary desc) rank
        from t_dept d
        left join t_employee e on d.id = e.deptid
        left join t_salary s on e.id = s.empid
        where s.salary is not null;
    2、獲取顯示部門薪資top2的員工信息
        select 
           tmp.* 
        from 
        (select
           e.name, if(sex == 0, '女', '男') as gender, d.name, s.salary, 
           row_number() over(partition by e.deptid order by s.salary desc) rank 
        from t_dept d
        left join t_employee e on d.id = e.deptid
        left join t_salary s on e.id = s.empid
        where s.salary is not null) tmp
        where tmp.rank < 3; 
        如果查詢的是單表,則可以不用子查詢,只用用having來(lái)獲取即可(having rank < 3)

直接看下面的一個(gè)例子就可以知道row_number的使用方法了:

hive (mydb2)> create table t9(
            >   id int,
            >   province string,
            >   salary float
            > );
hive (mydb2)> insert into t9 values(1,'gd',18000),(2,'gd',16000),(3,'bj',13000),(4,'gd',15000),(5,'bj',17000),(6,'bj',19000);
hive (mydb2)> select * from t9;
OK
1       gd      18000.0
2       gd      16000.0
3       bj      13000.0
4       gd      15000.0
5       bj      17000.0
6       bj      19000.0
Time taken: 0.097 seconds, Fetched: 6 row(s)
hive (mydb2)> select
            >   id,
            >   province,
            >   salary,
            >   row_number() over(partition by province order by salary desc) as rank
            > from t9;
OK
6       bj      19000.0 1
5       bj      17000.0 2
3       bj      13000.0 3
1       gd      18000.0 1
2       gd      16000.0 2
4       gd      15000.0 3
Time taken: 1.578 seconds, Fetched: 6 row(s)

Hive自定義函數(shù)

自定義函數(shù)步驟

自定義函數(shù)需要遵循的6個(gè)步驟:

1°、自定義一個(gè)Java類來(lái)繼承UDF類
2°、覆蓋其中的evaluate()的函數(shù),有系統(tǒng)去調(diào)用
3°、將寫好的程序打成一個(gè)jar,上傳至服務(wù)器
4°、將3°中的jar加載到hive的classpath
hive終端執(zhí)行add jar jar_path;
5°、給自定義函數(shù)設(shè)置一個(gè)臨時(shí)的名稱,也就是說(shuō)要?jiǎng)?chuàng)建一個(gè)臨時(shí)的函數(shù)
create temporary function 函數(shù)名 as '寫的evalutor所在類的全類名';
6°、執(zhí)行函數(shù)結(jié)束之后,可以手動(dòng)銷毀臨時(shí)函數(shù),或者不用管,因?yàn)楫?dāng)前會(huì)話消失,函數(shù)自動(dòng)銷毀

UDF案例:要根據(jù)用戶的birthday,統(tǒng)計(jì)對(duì)應(yīng)的×××和星座

程序代碼如下:

package com.uplooking.bigdata.hive.udf;

import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;

import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Calendar;
import java.util.Date;

@Description(name = "z_c",
        value = "_FUNC_(param1, param2) - 返回給定日期對(duì)應(yīng)的×××或者星座",
        extended = "param1,param2參數(shù)可以是一下:\n"
                + "1. param1 is A string in the format of 'yyyy-MM-dd HH:mm:ss' or 'yyyy-MM-dd'.\n"
                + "2. param1 date value\n"
                + "3. param1 timestamp value\n"
                + "3. param2 0 or 1, 0 means constellation, 1 means zodica\n"
                + "Example:\n "
                + "  > SELECT _FUNC_('2009-07-30', 0) FROM src LIMIT 1;\n" + "  獅子座")
public class ZodicaAndConstellationUDF extends UDF {

    public Text evaluate(java.sql.Date date, int type) {
        if(type == 0) {//星座
            return new Text(getConstellation(new Date(date.getTime())));
        } else if(type == 1) { //×××
            return new Text(getZodica(new Date(date.getTime())));
        }
        return null;
    }

    public String[] zodiacArr = { "猴", "雞", "狗", "豬", "鼠", "牛", "虎", "兔", "龍", "蛇", "馬", "羊" };
    public String[] constellationArr = { "水瓶座", "雙魚(yú)座", "白羊座", "金牛座", "雙子座", "巨蟹座", "獅子座", "×××座", "天秤座", "天蝎座", "射手座", "魔羯座" };
    public int[] constellationEdgeDay = { 20, 19, 21, 21, 21, 22, 23, 23, 23, 23, 22, 22 };
    /**
     * 根據(jù)日期獲取×××
     * @return
     */
    public String getZodica(Date date) {
        Calendar cal = Calendar.getInstance();
        cal.setTime(date);
        return zodiacArr[cal.get(Calendar.YEAR) % 12];
    }
    /**
     * 根據(jù)日期獲取星座
     * @return
     */
    public String getConstellation(Date date) {
        if (date == null) {
            return "";
        }
        Calendar cal = Calendar.getInstance();
        cal.setTime(date);
        int month = cal.get(Calendar.MONTH);
        int day = cal.get(Calendar.DAY_OF_MONTH);
        if (day < constellationEdgeDay[month]) {
            month = month - 1;
        }
        if (month >= 0) {
            return constellationArr[month];
        }
    // default to return 魔羯
        return constellationArr[11];
    }
}

注意依賴在筆記最后面。

上傳到服務(wù)器后,在hive終端中加載到hive的classpath:

add jar /home/uplooking/jars/hive/udf-zc.jar

自定義函數(shù):

create temporary function zc as 'com.uplooking.bigdata.hive.udf.ZodicaAndConstellationUDF';

創(chuàng)建測(cè)試用的臨時(shí)表:

hive (mydb1)>
            > create temporary table tmp(
            > birthday date);

插入測(cè)試用的數(shù)據(jù):

hive (mydb1)> insert into tmp values('1994-06-21');

在查詢中使用函數(shù):

hive (mydb1)> select zc(birthday,0) from tmp;
OK
c0
雙子座
Time taken: 0.084 seconds, Fetched: 1 row(s)
hive (mydb1)> select zc(birthday,1) from tmp;
OK
c0
狗
Time taken: 0.044 seconds, Fetched: 1 row(s)

下面是一個(gè)更簡(jiǎn)單的UDF函數(shù),可以參考進(jìn)行測(cè)試:

package cn.xpleaf.hive.udf;

import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;

/**
 * @author Leaf
 * @date 2018/9/18 下午11:11
 */
@Description(name = "addUDF", value = "_FUNC_(num1, num2) - 返回給定兩個(gè)數(shù)的和")
public class AddUDF extends UDF {

    public Text evaluate(int num1, int num2) {
        return new Text(String.valueOf(num1 + num2));
    }

}

Hive之jdbc

Hive除了提供前面的cli用戶接口,還提供了jdbc的用戶接口,但是如果需要使用該接口,則需要先啟動(dòng)hiveserver2服務(wù),啟動(dòng)該服務(wù)后,可以通過(guò)hive提供的beeline繼續(xù)以cli的方式操作hive(不過(guò)需要注意的是,此時(shí)是通過(guò)jdbc接口進(jìn)行操作hive的),也可以通過(guò)手工編寫java代碼來(lái)進(jìn)行操作。

啟動(dòng)hiveserver2服務(wù)

[uplooking@uplooking01 ~]$ hiveserver2

通過(guò)beeline連接hiveserver進(jìn)行操作

[uplooking@uplooking01 hive]$ beeline
which: no hbase in (/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/opt/jdk/bin:/home/uplooking/bin:/home/uplooking/app/zookeeper/bin:/home/uplooking/app/hadoop/bin:/home/uplooking/app/hadoop/sbin:/home/uplooking/app/hive/bin)
ls: 無(wú)法訪問(wèn)/home/uplooking/app/hive/lib/hive-jdbc-*-standalone.jar: 沒(méi)有那個(gè)文件或目錄
Beeline version 2.1.0 by Apache Hive
beeline> !connect jdbc:hive2://uplooking01:10000/mydb1
Connecting to jdbc:hive2://uplooking01:10000/mydb1
Enter username for jdbc:hive2://uplooking01:10000/mydb1: uplooking
Enter password for jdbc:hive2://uplooking01:10000/mydb1: *********
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/uplooking/app/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/uplooking/app/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Error: Failed to open new session: java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User: uplooking is not allowed to impersonate uplooking (state=,code=0)

可以看到出現(xiàn)錯(cuò)誤,解決方案如下:

在執(zhí)行JDBC的時(shí)候,訪問(wèn)不了遠(yuǎn)程的Hive的ThriftServer服務(wù)
報(bào)的錯(cuò)誤:uplooking不能偽裝為uplooking
    是因?yàn)榘姹驹谶M(jìn)行升級(jí)的時(shí)候考慮到的安全策略,需要我們手動(dòng)對(duì)uplooking進(jìn)行配置,需要將
hadoop中的uplooking用戶和hive中的uplooking用戶進(jìn)行打通,配置在$HADOOP_HOME/etc/hadoop/core-site.xml
中進(jìn)行配置:添加一下配置項(xiàng)
    <property>
        <name>hadoop.proxyuser.uplooking.hosts</name>
        <value>*</value>
        <description>這是uplooking用戶訪問(wèn)的本機(jī)地址</description>
    </property>
    <property>
        <name>hadoop.proxyuser.uplooking.groups</name>
        <value>root</value>
        <description>代理uplooking設(shè)置的組用戶</description>
    </property>     
配置成功之后,需要同步到集群中的各個(gè)節(jié)點(diǎn),
要想讓集群重新加載配置信息,至少hdfs需要重啟    

這樣之后就可以正常使用beeline通過(guò)hive提供的jdbc接口來(lái)操作hive了:

beeline> !connect jdbc:hive2://uplooking01:10000/mydb1
Connecting to jdbc:hive2://uplooking01:10000/mydb1
Enter username for jdbc:hive2://uplooking01:10000/mydb1: uplooking
Enter password for jdbc:hive2://uplooking01:10000/mydb1: *********
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/uplooking/app/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/uplooking/app/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Connected to: Apache Hive (version 2.1.0)
Driver: Hive JDBC (version 2.1.0)
18/03/23 08:00:15 [main]: WARN jdbc.HiveConnection: Request to set autoCommit to false; Hive does not support autoCommit=false.
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://uplooking01:10000/mydb1> show databases;
+----------------+--+
| database_name  |
+----------------+--+
| default        |
| mydb1          |
+----------------+--+
2 rows selected (2.164 seconds)
0: jdbc:hive2://uplooking01:10000/mydb1> show tables;
+-----------+--+
| tab_name  |
+-----------+--+
| t1        |
| t2        |
+-----------+--+
2 rows selected (0.118 seconds)
0: jdbc:hive2://uplooking01:10000/mydb1> select * from t1;
+------------+--+
|  t1.line   |
+------------+--+
| hello you  |
| hello he   |
| hello me   |
+------------+--+
3 rows selected (2.143 seconds)
0: jdbc:hive2://uplooking01:10000/mydb1> 

通過(guò)java代碼連接hiveserver進(jìn)行操作

程序代碼如下:

package com.uplooking.bigdata.hive.jdbc;

import java.sql.*;

public class HiveJDBC {
    public static void main(String[] args) throws Exception {
        Class.forName("org.apache.hive.jdbc.HiveDriver");
        Connection conn = DriverManager.getConnection("jdbc:hive2://uplooking01:10000/mydb1", "uplooking", "uplooking");
        String sql = "select t.word,count(t.word) as count from (select explode(split(line, ' ')) as word from t1) t group by t.word";
        PreparedStatement ps = conn.prepareStatement(sql);
        ResultSet rs = ps.executeQuery();
        while (rs.next()) {
            String word = rs.getString("word");
            int count = rs.getInt("count");
            System.out.println(word + "\t" + count);
        }
        rs.close();
        ps.close();
        conn.close();
    }
}

程序執(zhí)行結(jié)果如下:

18/03/23 00:48:16 INFO jdbc.Utils: Supplied authorities: uplooking01:10000
18/03/23 00:48:16 INFO jdbc.Utils: Resolved authority: uplooking01:10000
he  1
hello   3
me  1
you 1

在這個(gè)過(guò)程中,注意觀察hiveserver2終端的輸出:

WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = uplooking_20180323084825_63044683-393d-4625-a3c3-b440109c3d70
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1521765850571_0002, Tracking URL = http://uplooking02:8088/proxy/application_1521765850571_0002/
Kill Command = /home/uplooking/app/hadoop/bin/hadoop job  -kill job_1521765850571_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2018-03-23 08:48:33,427 Stage-1 map = 0%,  reduce = 0%
2018-03-23 08:48:40,864 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.54 sec
2018-03-23 08:48:48,294 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 6.84 sec
MapReduce Total cumulative CPU time: 6 seconds 840 msec
Ended Job = job_1521765850571_0002
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 6.84 sec   HDFS Read: 8870 HDFS Write: 159 SUCCESS
Total MapReduce CPU Time Spent: 6 seconds 840 msec
OK

Hive中文注釋亂碼解決

如果有亂碼出現(xiàn),可以嘗試下面的解決方案:

    hive中文注釋亂碼解決:
    在hive的元數(shù)據(jù)庫(kù)中,執(zhí)行一下腳本
        ALTER TABLE COLUMNS_V2 MODIFY COLUMN COMMENT VARCHAR(256) CHARACTER SET utf8;
        ALTER TABLE TABLE_PARAMS MODIFY COLUMN PARAM_VALUE VARCHAR(4000) CHARACTER SET utf8;
        ALTER TABLE PARTITION_PARAMS MODIFY COLUMN PARAM_VALUE VARCHAR(4000) CHARACTER SET utf8;
        ALTER TABLE PARTITION_KEYS MODIFY COLUMN PKEY_COMMENT VARCHAR(4000) CHARACTER SET utf8;
        ALTER TABLE INDEX_PARAMS MODIFY COLUMN PARAM_VALUE VARCHAR(4000) CHARACTER SET utf8;
    同時(shí)將url,加上utf-8
        &useUnicode=true&characterEncoding=UTF-8
          <property>
            <name>javax.jdo.option.ConnectionURL</name>
            <value>jdbc:mysql://uplooking01:3306/hive?createDatabaseIfNotExist=true&useUnicode=true&characterEncoding=UTF-8</value>
          </property>

Hive的maven依賴

<properties>
   <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
   <hive-api.version>2.1.0</hive-api.version>
   <hadoop-api.version>2.6.4</hadoop-api.version>
   <hadoop-core.version>1.2.1</hadoop-core.version>
</properties>

<dependencies>
<dependency>
  <groupId>junit</groupId>
  <artifactId>junit</artifactId>
  <version>4.12</version>
  <scope>test</scope>
</dependency>
<dependency>
  <groupId>org.apache.hadoop</groupId>
  <artifactId>hadoop-common</artifactId>
  <version>${hadoop-api.version}</version>
</dependency>
<dependency>
  <groupId>org.apache.hadoop</groupId>
  <artifactId>hadoop-mapreduce-client-core</artifactId>
  <version>${hadoop-api.version}</version>
</dependency>
<dependency>
  <groupId>org.apache.hadoop</groupId>
  <artifactId>hadoop-core</artifactId>
  <version>${hadoop-core.version}</version>
</dependency>
<dependency>
  <groupId>org.apache.hive</groupId>
  <artifactId>hive-exec</artifactId>
  <version>${hive-api.version}</version>
</dependency>
<dependency>
  <groupId>org.apache.hive</groupId>
  <artifactId>hive-serde</artifactId>
  <version>${hive-api.version}</version>
</dependency>
<dependency>
  <groupId>org.apache.hive</groupId>
  <artifactId>hive-service</artifactId>
  <version>${hive-api.version}</version>
</dependency>
<dependency>
  <groupId>org.apache.hive</groupId>
  <artifactId>hive-metastore</artifactId>
  <version>${hive-api.version}</version>
</dependency>
<dependency>
  <groupId>org.apache.hive</groupId>
  <artifactId>hive-common</artifactId>
  <version>${hive-api.version}</version>
</dependency>
<dependency>
  <groupId>org.apache.hive</groupId>
  <artifactId>hive-cli</artifactId>
  <version>${hive-api.version}</version>
</dependency>
<dependency>
  <groupId>org.apache.hive</groupId>
  <artifactId>hive-jdbc</artifactId>
  <version>${hive-api.version}</version>
</dependency>
<dependency>
  <groupId>org.apache.thrift</groupId>
  <artifactId>libfb303</artifactId>
  <version>0.9.0</version>
</dependency>
</dependencies>
向AI問(wèn)一下細(xì)節(jié)

免責(zé)聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng),如果涉及侵權(quán)請(qǐng)聯(lián)系站長(zhǎng)郵箱:is@yisu.com進(jìn)行舉報(bào),并提供相關(guān)證據(jù),一經(jīng)查實(shí),將立刻刪除涉嫌侵權(quán)內(nèi)容。

AI