溫馨提示×

您好,登錄后才能下訂單哦!

密碼登錄×
登錄注冊(cè)×
其他方式登錄
點(diǎn)擊 登錄注冊(cè) 即表示同意《億速云用戶(hù)服務(wù)條款》

hive 與 hbase 結(jié)合

發(fā)布時(shí)間:2020-07-15 20:38:46 來(lái)源:網(wǎng)絡(luò) 閱讀:5280 作者:Meteor_hy 欄目:大數(shù)據(jù)

一、hive與hbase的結(jié)合
Hive會(huì)經(jīng)常和Hbase結(jié)合使用,把Hbase作為Hive的存儲(chǔ)路徑,所以Hive整合Hbase尤其重要。使用Hive讀取Hbase中的數(shù)據(jù),可以使用HQL語(yǔ)句在HBase表上進(jìn)行查詢(xún)、插入操作;甚至是進(jìn)行Join和Union等復(fù)雜查詢(xún)。此功能是從Hive 0.6.0開(kāi)始引入的。Hive與HBase整合的實(shí)現(xiàn)是利用兩者本身對(duì)外的API接口互相進(jìn)行通信,相互通信主要是依靠hive-hbase-handler-*.jar工具里面的類(lèi)實(shí)現(xiàn)的。使用Hive操作HBase中的表,只是提供了便捷性,hiveQL引擎使用的是MapReduce,對(duì)于性能上,表現(xiàn)不盡人意。

步驟:
1、將hbase相關(guān)jar包復(fù)制到hive/lib下,操作如下:

[hadoop@bus-stable hive]$ cp /opt/hbase/lib/hbase-protocol-1.4.5.jar /opt/hive/lib/
[hadoop@bus-stable hive]$ cp /opt/hbase/lib/hbase-server-1.4.5.jar /opt/hive/lib/
[hadoop@bus-stable hive]$ cp /opt/hbase/lib/hbase-client-1.4.5.jar /opt/hive/lib/
[hadoop@bus-stable hive]$ cp /opt/hbase/lib/hbase-common-1.4.5.jar /opt/hive/lib/
[hadoop@bus-stable hive]$ cp /opt/hbase/lib/hbase-common-1.4.5-tests.jar /opt/hive/lib/
[hadoop@bus-stable hive]$ 

2、在hive-site.xml文件中引用hbase,添加如下內(nèi)容:

[hadoop@bus-stable hive]$ vim /opt/hive/conf/hive-site.xml 
  <property>
    <name>hive.aux.jars.path</name>
    <value>
        file:///opt/hive/lib/hive-hbase-handler-2.3.3.jar,
        file:///opt/hive/lib/hbase-protocol-1.4.5.jar,
        file:///opt/hive/lib/hbase-server-1.4.5.jar,
        file:///opt/hive/lib/hbase-client-1.4.5.jar,
        file:///opt/hive/lib/hbase-common-1.4.5.jar,
        file:///opt/hive/lib/hbase-common-1.4.5-tests.jar,
        file:///opt/hive/lib/zookeeper-3.4.6.jar,
        file:///opt/hive/lib/guava-14.0.1.jar
    </value>
    <description>The location of the plugin jars that contain implementations of user defined functions and serdes.</description>
  </property>
  <property>
        <name>hbase.zookeeper.quorum</name>
        <value>open-stable,permission-stable,sp-stable</value>
  </property>
  <property>
        <name>dfs.permissions.enabled</name>
        <value>false</value>
  </property>  

3、啟動(dòng)hive:

[hadoop@bus-stable hive]$ hive -hiveconf hbase.master=oversea-stable:60000          
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/apache-hive-2.3.3-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop-2.9.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/opt/apache-hive-2.3.3-bin/lib/hive-common-2.3.3.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive> create table htest(key int,value string) stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' with serdeproperties('hbase.columns.mapping'=':key,f:value') tblproperties('hbase.table.name'='htest');
OK
Time taken: 9.376 seconds
hive> show databases;
OK
default
inspiry
Time taken: 0.121 seconds, Fetched: 2 row(s)
hive> show tables;
OK
htest
Time taken: 0.047 seconds, Fetched: 1 row(s)
hive> select * from htest; 
OK
Time taken: 1.967 seconds
hive> 

4、在hbase中驗(yàn)證數(shù)據(jù):

[hadoop@oversea-stable opt]$ hbase shell 
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hbase-1.4.5/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop-2.9.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
HBase Shell
Use "help" to get list of supported commands.
Use "exit" to quit this interactive shell.
Version 1.4.5, rca99a9466415dc4cfc095df33efb45cb82fe5480, Wed Jun 13 15:13:00 EDT 2018

hbase(main):001:0> list
TABLE                                                                                                                                                     
htest                                                                                                                                                     
1 row(s) in 0.2970 seconds

=> ["htest"]
hbase(main):002:0> scan "htest"
ROW                                     COLUMN+CELL                                                                                                       
0 row(s) in 0.1410 seconds

hbase(main):003:0> 

二、導(dǎo)入外部數(shù)據(jù)
(1) 數(shù)據(jù)文件如下:
[hadoop@bus-stable ~]$ cat score.csv
hive,85
hbase,90
hadoop,92
flume,89
kafka,95
spark,80
storm,70
[hadoop@bus-stable ~]$ hadoop fs -put score.csv /data/score.csv
[hadoop@bus-stable ~]$ hadoop fs -ls /data/
Found 2 items
-rw-r--r-- 3 hadoop supergroup 88822 2018-06-15 10:32 /data/notepad.txt
-rw-r--r-- 3 hadoop supergroup 70 2018-06-26 15:59 /data/score.csv
[hadoop@bus-stable ~]$

(2) 創(chuàng)建外部表
利用hdfs上的現(xiàn)有數(shù)據(jù),創(chuàng)建hive外部表
hive> create external table if not exists course.testcourse(cname string,score int) row format delimited fields terminated by ',' stored as textfile location '/data';
OK
Time taken: 0.282 seconds
hive> show databases;
OK
course
default
inspiry
Time taken: 0.013 seconds, Fetched: 3 row(s)
hive> use course;
OK
Time taken: 0.021 seconds
hive> show tables;
OK
testcourse
Time taken: 0.036 seconds, Fetched: 1 row(s)
hive> select * from testcourse ;
OK
hive 85
hbase 90
hadoop 92
flume 89
kafka 95
spark 80
storm 70
Time taken: 2.272 seconds, Fetched: 7 row(s)
hive>

三、利用HQL語(yǔ)句創(chuàng)建hbase 表
使用HQL語(yǔ)句創(chuàng)建一個(gè)指向HBase的Hive表,語(yǔ)法如下:

CREATE TABLE tbl_name(key int, value string)  //Hive中的表名tbl_name
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'   //指定存儲(chǔ)處理器
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")  //聲明列族,列名
TBLPROPERTIES ("hbase.table.name" = "tbl_name", "hbase.mapred.output.outputtable" = "iteblog");  //hbase.table.name 聲明HBase表名, 為可選屬性默認(rèn)與Hive的表名相同, hbase.mapred.output.outputtable 指定插入數(shù)據(jù)時(shí)寫(xiě)入的表, 如果以后需要往該表插入數(shù)據(jù)就需要指定該值

(1) 創(chuàng)建語(yǔ)句如下

hive> create table course.hbase_testcourse(cname string,score int) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES("hbase.columns.mapping" = ":key,cf:score")TBLPROPERTIES("hbase.table.name" = "hbase_testcourse","hbase.mapred.output.outputtable" = "hbase_testcourse"); 
OK
Time taken: 3.745 seconds
hive> show databases;
OK
course
default
inspiry
Time taken: 0.019 seconds, Fetched: 3 row(s)
hive> use course;
OK
Time taken: 0.02 seconds
hive> show tables;
OK
hbase_testcourse
testcourse
Time taken: 0.025 seconds, Fetched: 2 row(s)
hive> select * from hbase_testcourse;
OK
Time taken: 1.883 seconds
hive> 

(2) 創(chuàng)建完內(nèi)部表,可以通過(guò)Hive支持的insert overwrite 方式將一個(gè)表的數(shù)據(jù)導(dǎo)入 HBase

hive> insert overwrite table course.hbase_testcourse select cname,score from course.testcourse;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20180626170540_c7eecb8d-2925-4ad2-be7f-237d9815d1cb
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1529932626564_0002, Tracking URL = http://oversea-stable:8088/proxy/application_1529932626564_0002/
Kill Command = /opt/hadoop/bin/hadoop job  -kill job_1529932626564_0002
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
2018-06-26 17:06:02,793 Stage-3 map = 0%,  reduce = 0%
2018-06-26 17:06:14,126 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 6.12 sec
MapReduce Total cumulative CPU time: 6 seconds 120 msec
Ended Job = job_1529932626564_0002
MapReduce Jobs Launched: 
Stage-Stage-3: Map: 1   Cumulative CPU: 6.12 sec   HDFS Read: 4224 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 6 seconds 120 msec
OK
Time taken: 41.489 seconds
hive> 
hive> select * from hbase_testcourse;
OK
flume   89
hadoop  92
hbase   90
hive    85
kafka   95
spark   80
storm   70
Time taken: 0.201 seconds, Fetched: 7 row(s)
hive> 

(3) 驗(yàn)證hbase

hbase(main):011:0> list
TABLE                                                                                                                                                     
hbase_testcourse                                                                                                                                          
htest                                                                                                                                                     
2 row(s) in 0.0110 seconds
=> ["hbase_testcourse", "htest"]
hbase(main):012:0> scan "hbase_testcourse"
ROW                                     COLUMN+CELL                                                                                                       
 flume                                  column=cf:score, timestamp=1530003973026, value=89                                                                
 hadoop                                 column=cf:score, timestamp=1530003973026, value=92                                                                
 hbase                                  column=cf:score, timestamp=1530003973026, value=90                                                                
 hive                                   column=cf:score, timestamp=1530003973026, value=85                                                                
 kafka                                  column=cf:score, timestamp=1530003973026, value=95                                                                
 spark                                  column=cf:score, timestamp=1530003973026, value=80                                                                
 storm                                  column=cf:score, timestamp=1530003973026, value=70                                                                
7 row(s) in 0.0760 seconds
hbase(main):013:0> 

四、使用Hive映射HBase中已經(jīng)存在的表
(1) 在hbase中創(chuàng)建HBase表,進(jìn)入HBase Shell客戶(hù)端執(zhí)行建表命令

hbase(main):036:0> create 'hbase_test',{ NAME => 'cf'}
0 row(s) in 2.2830 seconds
=> Hbase::Table - hbase_test

(2) 插入數(shù)據(jù)

hbase(main):037:0> put 'hbase_test','hadoop','cf:score', '95'
0 row(s) in 0.1110 seconds

hbase(main):038:0> put 'hbase_test','storm','cf:score', '96'
0 row(s) in 0.0120 seconds

hbase(main):039:0> put 'hbase_test','spark','cf:score', '97'
0 row(s) in 0.0110 seconds

(3) 查看數(shù)據(jù)

hbase(main):041:0> scan "hbase_test"
ROW                                     COLUMN+CELL                                                                                                       
 hadoop                                 column=cf:score, timestamp=1530004351399, value=95                                                                
 spark                                  column=cf:score, timestamp=1530004365368, value=97                                                                
 storm                                  column=cf:score, timestamp=1530004359169, value=96                                                                
3 row(s) in 0.0220 seconds
hbase(main):042:0> 

(4) 進(jìn)入Hive Shell 客戶(hù)端,創(chuàng)建外部表course.hbase_test,建表命令如下所示

hive>  create external table course.hbase_test(cname string,score int) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES("hbase.columns.mapping" = ":key,cf:score") TBLPROPERTIES("hbase.table.name" = "hbase_test", "hbase.mapred.output.outputtable" = "hbase_test");
OK
Time taken: 0.221 seconds
hive> show tables;
OK
hbase_test
hbase_testcourse
testcourse
Time taken: 0.024 seconds, Fetched: 3 row(s)
備注:創(chuàng)建外部表和創(chuàng)建內(nèi)部表的命令基本一致,唯一的區(qū)別就是:創(chuàng)建內(nèi)部表使用create table,創(chuàng)建外部表使用create external table。

Hive 查看數(shù)據(jù)
hive> select * from hbase_test;
OK
hadoop  95
spark   97
storm   96
Time taken: 0.22 seconds, Fetched: 3 row(s)
hive> 

該Hive表一個(gè)外部表,所以刪除該表并不會(huì)刪除HBase表中的數(shù)據(jù),有幾點(diǎn)需要注意的是:
a)、建表或映射表的時(shí)候如果沒(méi)有指定:key則第一個(gè)列默認(rèn)就是行鍵
b)、HBase對(duì)應(yīng)的Hive表中沒(méi)有時(shí)間戳概念,默認(rèn)返回的就是最新版本的值
c)、由于HBase中沒(méi)有數(shù)據(jù)類(lèi)型信息,所以在存儲(chǔ)數(shù)據(jù)的時(shí)候都轉(zhuǎn)化為String類(lèi)型

五、使用java連接hive操作hbase
pom.xml

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>cn.itcast.hbase</groupId>
    <artifactId>hbase</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <dependencies>
        <!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client -->
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>2.6.4</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>2.6.4</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-hdfs -->
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.12</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.apache.hbase/hbase-client -->
        <dependency>
            <groupId>org.apache.hbase</groupId>
            <artifactId>hbase-client</artifactId>
            <version>1.4.0</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.apache.hbase/hbase-server -->
        <dependency>
            <groupId>org.apache.hbase</groupId>
            <artifactId>hbase-server</artifactId>
            <version>1.4.0</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.apache.hive/hive-jdbc -->
        <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-jdbc</artifactId>
            <version>1.2.1</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.apache.hive/hive-metastore -->
        <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-metastore</artifactId>
            <version>1.2.1</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.apache.hive/hive-jdbc -->

        <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-exec</artifactId>
            <version>1.2.1</version>
        </dependency>
    </dependencies>
</project>

Hive_Hbase.java

package cn.itcast.bigdata.hbase;

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;

public class Hive_Hbase {
    public static void main(String[] args) {

        try {
            Class.forName("org.apache.hive.jdbc.HiveDriver");
            Connection connection = DriverManager.getConnection("jdbc:hive2://hadoop1:10000/shizhan02","hadoop","");
            Statement statement = connection.createStatement();
            String sql = "SELECT * FROM hive_hbase_table_kv";
            ResultSet res = statement.executeQuery(sql);
            while (res.next()) {
                System.out.println(res.getString(2));
            }
        } catch (ClassNotFoundException | SQLException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

    }

}
向AI問(wèn)一下細(xì)節(jié)

免責(zé)聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng),如果涉及侵權(quán)請(qǐng)聯(lián)系站長(zhǎng)郵箱:is@yisu.com進(jìn)行舉報(bào),并提供相關(guān)證據(jù),一經(jīng)查實(shí),將立刻刪除涉嫌侵權(quán)內(nèi)容。

AI