溫馨提示×

溫馨提示×

您好,登錄后才能下訂單哦!

密碼登錄×
登錄注冊×
其他方式登錄
點(diǎn)擊 登錄注冊 即表示同意《億速云用戶服務(wù)條款》

Phoenix(sql on hbase

發(fā)布時(shí)間:2021-11-23 15:13:42 來源:億速云 閱讀:137 作者:柒染 欄目:數(shù)據(jù)庫

Phoenix(sql on hbase)簡介Phoenix(sql on hbase


介紹:

Phoenix is a SQL skin over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular
JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance
on the order of milliseconds for small queries, or seconds for tens of millions of rows.

Phoenix(sql on hbase


部署:

1:wget http://phoenix-bin.github.com/client/phoenix-2.2.1-install.tar,將jar包拷貝至HBASE_HOME/lib即可 2:執(zhí)行psql.sh localhost ../examples/web_stat.sql ../examples/web_stat.csv ../examples/web_stat_queries.sql,加載示例數(shù)據(jù) 3:sqlline.sh localhost(zookeeper地址)進(jìn)入命令行客戶端Phoenix(sql on hbase


相關(guān)文檔:

wiki主頁(文檔很詳細(xì)):
https://github.com/forcedotcom/phoenix/wiki
Quick Start
https://github.com/forcedotcom/phoenix/wiki/Phoenix-in-15-minutes-or-less
Recently Implemented Features
https://github.com/forcedotcom/phoenix/wiki/Recently-Implemented-Features
Phoenix Performance vs Hive,Impala
https://github.com/forcedotcom/phoenix/wiki/Performance#salting
官方實(shí)時(shí)性能測試結(jié)果:
http://phoenix-bin.github.io/client/performance/latest.htm
語法:
http://forcedotcom.github.io/phoenix/index.html

Phoenix(sql on hbase


二級索引相關(guān)(索引的使用需要調(diào)用Phoenix API):

二級索引(多列時(shí))使用需要在hbase-site.xml中加入如下配置

<property>   <name>hbase.regionserver.wal.codec</name>   <value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value> </property>


創(chuàng)建索引例子:

create table usertable (id varchar primary key, firstname varchar, lastname varchar); create index idx_name on usertable (firstname) include (lastname);   可通過如下方法查看當(dāng)前SQL語句索引是否生效: explain select id, firstname, lastname from usertable where firstname = 'foo';


explain的相關(guān)說明:

RANGE SCAN means that only a subset of the rows in your table will be scanned over. This occurs if you use one or more leading columns from your primary key constraint. Query that is not filtering on leading PK columns ex. select * from test where pk2='x'
and pk3='y'; will result in full scan whereas the following query will result in range scan select * from test where pk1='x' and pk2='y';. Note that you can add a secondary index on your "pk2" and "pk3" columns and that would cause a range scan to be done
for the first query (over the index table).   DEGENERATE SCAN means that a query can't possibly return any rows. If we can determine that at compile time, then we don't bother to even run the scan.   FULL SCAN means that all rows of the table will be scanned over (potentially with a filter applied if you have a WHERE clause)   SKIP SCAN means that either a subset or all rows in your table will be scanned over, however it will skip large groups of rows depending on the conditions in your filter. See this blog for more detail. We don't do a SKIP SCAN if you have no filter on the
leading primary key columns, but you can force a SKIP SCAN by using the /*+ SKIP_SCAN */ hint. Under some conditions, namely when the cardinality of your leading primary key columns is low, it will be more efficient than a FULL SCAN.


索引使用介紹:

  • 主鍵索引:主鍵索引要按創(chuàng)建時(shí)的順序引用。如primary key(id,name,add),那么會(huì)隱式的創(chuàng)建(id),(id,name),(id,name,add)三個(gè)索引,如果在where中用這三個(gè)條件會(huì)用到索引,其他組合則無法使用索引(FULL SCAN)。

  • 二級索引:除了要按創(chuàng)建時(shí)的順序引用外,如果查詢的列不全在索引或者覆蓋索引中則無法使用索引。 舉例: DDL:create table usertable (id varchar primary key, firstname varchar, lastname varchar);      create index idx_name on usertable (firstname); DML:select id, firstname, lastname from usertable where firstname = 'foo'; 此查詢不會(huì)使用到索引,因?yàn)閘astname不再索引中。   執(zhí)行DDL:create idx_name on usertable (firstname) include (lastname)后該查詢語句才能使用索引。   遺留問題:include和on在Phoenix中具體有什么區(qū)別?

  • 查詢條件中主鍵索引+二級索引同時(shí)存在的話,Phoenix會(huì)自己選擇最優(yōu)索引。

Phoenix(sql on hbase


Phoenix的SQL表結(jié)構(gòu)與Hbase結(jié)構(gòu)的映射實(shí)驗(yàn)

>>create table user3table (id varchar, firstname varchar, lastname varchar CONSTRAINT PK PRIMARY KEY (id,firstname));   >>!describe user3table   +------------+-------------+------------+-------------+-----------+------------+-------------+---------------+----------------+----------------+------+ | TABLE_CAT  | TABLE_SCHEM | TABLE_NAME | COLUMN_NAME | DATA_TYPE | TYPE_NAME  | COLUMN_SIZE | BUFFER_LENGTH | DECIMAL_DIGITS | NUM_PREC_RADIX | NULL | +------------+-------------+------------+-------------+-----------+------------+-------------+---------------+----------------+----------------+------+ | null       | null        | USER3TABLE | ID          | 12        | VARCHAR    | null        | null          | null           | null           | 1    | | null       | null        | USER3TABLE | FIRSTNAME   | 12        | VARCHAR    | null        | null          | null           | null           | 1    | | _0         | null        | USER3TABLE | LASTNAME    | 12        | VARCHAR    | null        | null          | null           | null           | 1    | +------------+-------------+------------+-------------+-----------+------------+-------------+---------------+----------------+----------------+------+   >>!index user3table; +-----------+-------------+------------+------------+-----------------+------------+------+------------------+-------------+-------------+------------+ | TABLE_CAT | TABLE_SCHEM | TABLE_NAME | NON_UNIQUE | INDEX_QUALIFIER | INDEX_NAME | TYPE | ORDINAL_POSITION | COLUMN_NAME | ASC_OR_DESC | CARDINALIT | +-----------+-------------+------------+------------+-----------------+------------+------+------------------+-------------+-------------+------------+ +-----------+-------------+------------+------------+-----------------+------------+------+------------------+-------------+-------------+------------+   >>select * from user3table; +------------+------------+------------+ |     ID     | FIRSTNAME  |  LASTNAME  | +------------+------------+------------+ | hup        | zhan       | feng       | +------------+------------+------------+   >>hbase>>scan 'USER3TABLE'   ROW                                    COLUMN+CELL                                                                                                      hup\x00zhan                           column=_0:LASTNAME, timestamp=1387875092585, value=feng                                                          hup\x00zhan                           column=_0:_0, timestamp=1387875092585, value= 1 row(s) in 0.0110 seconds   >>create index idx_test on user3table (firstname) include (lastname);   >>!index user3table; +-----------+-------------+------------+------------+-----------------+------------+------+------------------+-------------+-------------+------------+ | TABLE_CAT | TABLE_SCHEM | TABLE_NAME | NON_UNIQUE | INDEX_QUALIFIER | INDEX_NAME | TYPE | ORDINAL_POSITION | COLUMN_NAME | ASC_OR_DESC | CARDINALIT | +-----------+-------------+------------+------------+-----------------+------------+------+------------------+-------------+-------------+------------+ | null      | null        | USER3TABLE | true       | null            | IDX_TEST   | 3    | 1                | :FIRSTNAME  | A           | null       | | null      | null        | USER3TABLE | true       | null            | IDX_TEST   | 3    | 2                | :ID         | A           | null       | | null      | null        | USER3TABLE | true       | null            | IDX_TEST   | 3    | 3                | _0:LASTNAME | null        | null       | +-----------+-------------+------------+------------+-----------------+------------+------+------------------+-------------+-------------+------------+   >>select * from user3table; +------------+------------+------------+ |     ID     | FIRSTNAME  |  LASTNAME  | +------------+------------+------------+ | hup        | zhan       | feng       | +------------+------------+------------+   >>hbase>>scan 'USER3TABLE'   ROW                                    COLUMN+CELL                                                                                                      hup\x00zhan                           column=_0:LASTNAME, timestamp=1387875092585, value=feng                                                          hup\x00zhan                           column=_0:_0, timestamp=1387875092585, value= 1 row(s) in 0.0110 seconds   此外:當(dāng)表中非主鍵的列有多個(gè)時(shí)會(huì)統(tǒng)一加后綴:    1:NASalesforce.com\x00Login\x00\x00\x00 column=STATS:ACTIVE_VISITOR, timestamp=1387867968156, value=\x80\x00\x1A"                                        \x01C%\x17\xFE0                                                                                                                                        2:NASalesforce.com\x00Login\x00\x00\x00 column=USAGE:CORE, timestamp=1387867968156, value=\x80\x00\x00\x00\x00\x00\x00\xC9                               \x01C%\x17\xFE0                                                                                                                                        3:NASalesforce.com\x00Login\x00\x00\x00 column=USAGE:DB, timestamp=1387867968156, value=\x80\x00\x00\x00\x00\x00\x02\x84                                 \x01C%\x17\xFE0                                                                                                                                        4:NASalesforce.com\x00Login\x00\x00\x00 column=USAGE:_0, timestamp=1387867968156, value=                                                                \x01C%\x17\xFE0

結(jié)論:
1:Phoenix會(huì)把“CONSTRAINT PK PRIMARY KEY (id,firstname)”這樣定義的列拼起來加入到Hbase主鍵中(用\x00進(jìn)行分割),同時(shí)將聯(lián)合主鍵涉及到的列合并默認(rèn)名為"_0"的一列。其值為空。其他列放入Hbase的同名列中存儲(chǔ)
2:Phoenix在Hbase中維護(hù)了一張系統(tǒng)表(SYSTEM TABLE)來存儲(chǔ)相關(guān)Phoenix表的scheme元數(shù)據(jù)。
3:創(chuàng)建二級索引(create index)操作不會(huì)影響表結(jié)構(gòu)
4:如果建表時(shí)不指定列族,則列族以_0、_1的方式命名
5:如果有多列時(shí)value值通過HBase接口獲取的并不是直接可用的值(只能通過Phoenix接口獲取正常值)

Phoenix(sql on hbase


動(dòng)態(tài)scheme相關(guān)

1:支持修改列

Example:   ALTER TABLE my_schema.my_table ADD d.dept_id char(10) VERSIONS=10 ALTER TABLE my_table ADD dept_name char(50) ALTER TABLE my_table ADD parent_id char(15) null primary key ALTER TABLE my_table DROP COLUMN d.dept_id ALTER TABLE my_table DROP COLUMN dept_name ALTER TABLE my_table DROP COLUMN parent_id ALTER TABLE my_table SET IMMUTABLE_ROWS=true


2:支持修改二級索引

Example:   CREATE INDEX my_idx ON sales.opportunity(last_updated_date DESC) CREATE INDEX my_idx ON log.event(created_date DESC) INCLUDE (name, payload) SALT_BUCKETS=10 CREATE INDEX IF NOT EXISTS my_comp_idx ON server_metrics ( gc_time DESC, created_date DESC )     DATA_BLOCK_ENCODING='NONE',VERSIONS=?,MAX_FILESIZE=2000000 split on (?, ?, ?)   ALTER INDEX my_idx ON sales.opportunity DISABLE ALTER INDEX IF EXISTS my_idx ON server_metrics REBUILD   DROP INDEX my_idx ON sales.opportunity DROP INDEX IF EXISTS my_idx ON server_metrics


3:應(yīng)該是不支持修改主鍵索引(沒找到相關(guān)信息。理論上也不好支持,因?yàn)橹麈I索引就是rowkey的值。)

Phoenix(sql on hbase


Java客戶端示例代碼(直接面向JDBC接口編程):

import java.sql.Connection; import java.sql.DriverManager; import java.sql.ResultSet; import java.sql.SQLException; import java.sql.PreparedStatement; import java.sql.Statement;   public class test {       public static void main(String[] args) throws SQLException {         Statement stmt = null;         ResultSet rset = null;           Connection con = DriverManager.getConnection("jdbc:phoenix:zookeeper");         stmt = con.createStatement();           stmt.executeUpdate("create table test (mykey integer not null primary key, mycolumn varchar)");         stmt.executeUpdate("upsert into test values (1,'Hello')");         stmt.executeUpdate("upsert into test values (2,'World!')");         con.commit();           PreparedStatement statement = con.prepareStatement("select * from test");         rset = statement.executeQuery();         while (rset.next()) {             System.out.println(rset.getString("mycolumn"));         }         statement.close();         con.close();     } }Phoenix(sql on hbase


單節(jié)點(diǎn)測試:

建表: CREATE TABLE IF NOT EXISTS $table (HOST CHAR(2) NOT NULL,DOMAIN VARCHAR NOT NULL, FEATURE VARCHAR NOT NULL,DATE DATE NOT NULL,USAGE.CORE BIGINT,USAGE.DB BIGINT,STATS.ACTIVE_VISITOR INTEGER CONSTRAINT PK PRIMARY KEY (HOST, DOMAIN, FEATURE, DATE)) SPLIT ON ('CSGoogle','CSSalesforce','EUApple','EUGoogle','EUSalesforce','NAApple','NAGoogle','NASalesforce');   performance_10000000數(shù)據(jù)表中是1000W條如下數(shù)據(jù):   +------+------------+------------+---------------------+----------+----------+----------------+ | HOST |   DOMAIN   |  FEATURE   |        DATE         |   CORE   |    DB    | ACTIVE_VISITOR | +------+------------+------------+---------------------+----------+----------+----------------+ | CS   | Apple.com  | Dashboard  | 2013-12-23          | 363      | 795      | 8390           | +------+------------+------------+---------------------+----------+----------+----------------+   Query # 1 - Count - SELECT COUNT(1) FROM performance_10000000;   COUNT(1) -------- 10000000 Time: 66.044 sec(s)   Query # 2 - Group By First PK - SELECT HOST FROM performance_10000000 GROUP BY HOST;   HOST ---- CS   EU   NA   Time: 51.43 sec(s)   Query # 3 - Group By Second PK - SELECT DOMAIN FROM performance_10000000 GROUP BY DOMAIN;   DOMAIN     ---------- Apple.com  Google.com Salesforce.com Time: 46.908 sec(s)   Query # 4 - Truncate + Group By - SELECT TRUNC(DATE,'DAY') DAY FROM performance_10000000 GROUP BY TRUNC(DATE,'DAY');   DAY                 ------------------- 2013-12-23 00:00:00 2013-12-24 00:00:00 2013-12-25 00:00:00 ...... Time: 48.132 sec(s)   Query # 5 - Filter + Count - SELECT COUNT(1) FROM performance_10000000 WHERE CORE<10;   COUNT(1) --------   198669 Time: 31.301 sec(s)  Phoenix(sql on hbase


集群(5*RegionServer)測試:

performance_10000000數(shù)據(jù)表中是1000W條如下數(shù)據(jù):   +------+------------+------------+---------------------+----------+----------+----------------+ | HOST |   DOMAIN   |  FEATURE   |        DATE         |   CORE   |    DB    | ACTIVE_VISITOR | +------+------------+------------+---------------------+----------+----------+----------------+ | CS   | Apple.com  | Dashboard  | 2013-12-23          | 363      | 795      | 8390           | +------+------------+------------+---------------------+----------+----------+----------------+   Query # 1 - Count - SELECT COUNT(1) FROM performance_10000000;   COUNT(1) -------- 19630614 Time: 13.879 sec(s)   Query # 2 - Group By First PK - SELECT HOST FROM performance_10000000 GROUP BY HOST;   HOST ---- CS   EU   NA   Time: 13.545 sec(s)   Query # 3 - Group By Second PK - SELECT DOMAIN FROM performance_10000000 GROUP BY DOMAIN;   DOMAIN     ---------- Apple.com  Google.com Salesforce.com Time: 12.907 sec(s)   Query # 4 - Truncate + Group By - SELECT TRUNC(DATE,'DAY') DAY FROM performance_10000000 GROUP BY TRUNC(DATE,'DAY');   DAY                 ------------------- 2013-12-23 00:00:00 2013-12-24 00:00:00 2013-12-25 00:00:00 ...... Time: 13.845 sec(s)   Query # 5 - Filter + Count - SELECT COUNT(1) FROM performance_10000000 WHERE CORE<10;   COUNT(1) --------   393154 Time: 8.522 sec(s)  Phoenix(sql on hbase


優(yōu)點(diǎn):

1:命令行和java客戶端使用都很簡單。尤其是java客戶端直接面向JDBC接口編程,封裝且優(yōu)化了Hbase很多細(xì)節(jié)。
2:在單表操作上性能比Hive Handler好很多(但是handler也有可能會(huì)升級加入斜處理器相關(guān)聚合等特性)
3:支持多列的二級索引,列數(shù)不限。其中可變索引時(shí)列數(shù)越多寫入速度越慢,不可變索引不影響寫入速度(參考:https://github.com/forcedotcom/phoenix/wiki/Secondary-Indexing#mutable-indexing)。
4:對Top-N查詢速度遠(yuǎn)超Hive(參考:https://github.com/forcedotcom/phoenix/wiki/Performance#top-n)
5:提供對rowkey分桶的特性,可以實(shí)現(xiàn)數(shù)據(jù)在各個(gè)region的均勻分布(參考:https://github.com/forcedotcom/phoenix/wiki/Performance#salting)
6:低侵入性,基本對原Hbase的使用沒什么影響
7:提供的函數(shù)基本都能cover住絕大多數(shù)需求了
8:與Hive不同的是,Phoenix的sql語句更接近標(biāo)準(zhǔn)sql規(guī)范。

Phoenix(sql on hbase


缺點(diǎn):

1:Phoenix創(chuàng)建的表Hbase可以識別并使用,但是使用Hbase創(chuàng)建的表,Phoenix不能識別,因?yàn)镻hoenix對每張表都有其相應(yīng)的元數(shù)據(jù)信息。
2:硬傷:多表join操作當(dāng)前不支持(官方文檔對當(dāng)前2.2.3版本的說法不一致,但3.0應(yīng)該會(huì)支持,有可能會(huì)引入Apache Drill把大表join分割成小任務(wù)的特性)。
3:目前只支持hbase0.94系列(這個(gè)應(yīng)該問題不大)

Phoenix(sql on hbase


其他

1:Phoenix對所有數(shù)據(jù)都是基于內(nèi)存進(jìn)行統(tǒng)計(jì)。因此從sql語句復(fù)雜程度以及java各數(shù)據(jù)結(jié)構(gòu)的性能,基本能對其消耗的時(shí)間有一個(gè)大概的估計(jì)。

Phoenix(sql on hbase


功能擴(kuò)展(impala不太熟,主要針對Hive說了)

如果將Hive和Phoenix集成的話,還是很不錯(cuò)的,兩者剛好互補(bǔ)。Hive并沒使用coprocesser,只是通過把數(shù)據(jù)取出來做MR,而Phoenix剛好是在單表取數(shù)據(jù)方面做了很多優(yōu)化。集成后可以享受到Phoenix的單表操作優(yōu)勢,同時(shí)可以解決多表join的問題(在Phoenix估計(jì)短時(shí)間難做出來大表join的方案,說是要模仿Drill,但是現(xiàn)在Drill本身救處于Alpha階段)。

如果集成的話主要工作是需要在Hive-hbase-handler中適配Hive相關(guān)單表操作到Phoenix的java客戶端接口。

不太成熟的問題:

1:是把Phoenix的單表操作移植到Hive中還是把Hive的join移植到Phoenix中?
2:是只對外提供Hive的接口還是同時(shí)對外提供Hive和Phoenix兩種接口呢?
3:適配的過程還有很多細(xì)節(jié)

嘗試調(diào)研了下Phoenix二級索引是否可以達(dá)到像華為一樣創(chuàng)建完可以無需修改HBase任何代碼就享受到二級索引的效果

Phoenix(sql on hbase


擴(kuò)展閱讀:

sql for hbase(Phoenix、Impala、Drill): http://www.orzota.com/sql-for-hbase/
SQL on Hadoop的最新進(jìn)展及7項(xiàng)相關(guān)技術(shù)分享:http://www.csdn.net/article/2013-10-18/2817214-big-data-hadoop

Phoenix(sql on hbase


對比華為HBase二級索引:

缺點(diǎn):華為二級索引需要在建表時(shí)指定列(及不支持動(dòng)態(tài)修改),同時(shí)華為代碼對Hbase本身侵入性太大(比如balancer要用華為的),難以升級維護(hù)。

優(yōu)點(diǎn):但是索引建好后,在對Hbase的scan、Puts、Deletes操作時(shí)使用Hbase原生代碼(無需任何改動(dòng))即可享受到索引的效果。也不需要指定使用哪個(gè)索引,它會(huì)自己使用最優(yōu)索引。

也就是說如果加上華為索引,Hive-hbase-handler無需改動(dòng)即可使用二級索引。但是phoenix目前只支持通過phoenix sql方式使用二級索引。

性能對比:暫未測試,估計(jì)差不太多

綜合看移植phoenix比移植華為更靠譜,phoenix侵入性小,功能更強(qiáng)大,且升級維護(hù)方面也比華為要靠譜。但是移植phoenix難度也相對比較大。

但是如果只是想短期起效果,可以嘗試下華為索引。

Phoenix(sql on hbase


淘寶開源項(xiàng)目Lealone:

是一個(gè)可用于HBase的分布式SQL引擎,主要功能就是能用SQL方式(JDBC)查詢Hbase,避免了HBase使用的繁瑣操作。相對與Phoenix的功能弱多了。

  • 支持高性能的分布式事務(wù),

  • 使用一個(gè)非常新穎的基于局部時(shí)間戳的多版本沖突與有效性檢測的分布式事務(wù)模型

  • 是對H2關(guān)系數(shù)據(jù)庫SQL引擎的改進(jìn)和擴(kuò)展

  • HBase建的表Lealone只能讀;Lealone建的表Lealone可以讀寫。

Phoenix(sql on hbase


基于Solr的HBase多條件查詢:

介紹:ApacheSolr 是一個(gè)開源的搜索服務(wù)器,Solr 使用 Java 語言開發(fā),主要基于 HTTP 和Apache Lucene 實(shí)現(xiàn)。

原理:基于Solr的HBase多條件查詢原理很簡單,將HBase表中涉及條件過濾的字段和rowkey在Solr中建立索引,通過Solr的多條件查詢快速獲得符合過濾條件的rowkey值,拿到這些rowkey之后在HBASE中通過指定rowkey進(jìn)行查詢。

缺點(diǎn):
1:ApacheSolr本身并不是專為HBase設(shè)計(jì)的。需要專門針對ApacheSolr寫Hbase的相關(guān)應(yīng)用,比如HBase寫數(shù)據(jù)時(shí)同步更新索引的過程需要我們自己寫協(xié)處理器。
2:ApacheSolr本身是一個(gè)WebService服務(wù),需要額外維護(hù)一個(gè)或多個(gè)ApacheSolr服務(wù)器。

向AI問一下細(xì)節(jié)

免責(zé)聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點(diǎn)不代表本網(wǎng)站立場,如果涉及侵權(quán)請聯(lián)系站長郵箱:is@yisu.com進(jìn)行舉報(bào),并提供相關(guān)證據(jù),一經(jīng)查實(shí),將立刻刪除涉嫌侵權(quán)內(nèi)容。

AI