<xmp id="hck2n"></xmp>

<meter id="hck2n"></meter>

<table id="hck2n"></table>

溫馨提示×

溫馨提示×

您好，登錄后才能下訂單哦！

密碼登錄×

忘記密碼？

登錄注冊×

獲取短信驗證碼

其他方式登錄

點擊登錄注冊即表示同意《億速云用戶服務條款》

用戶登錄×

賬戶密碼登錄

請使用微信掃描上方二維碼

使用幫助

請求超時！

請點擊重新獲取二維碼

elasticsearch-hadoop hive導入數(shù)據(jù)怎么實現(xiàn)不自動分詞

發(fā)布時間：2021-12-10 09:22:20 來源：億速云閱讀：204 作者：iii 欄目：云計算

這篇文章主要講解了“elasticsearch-hadoop hive導入數(shù)據(jù)怎么實現(xiàn)不自動分詞”，文中的講解內(nèi)容簡單清晰，易于學習與理解，下面請大家跟著小編的思路慢慢深入，一起來研究和學習“elasticsearch-hadoop hive導入數(shù)據(jù)怎么實現(xiàn)不自動分詞”吧！

背景

基于本公司使用es場景,不需要分詞功能.而es string 類型的時候,會自動分詞,導致省份、地區(qū)等字段都分詞了。

具體使用方式

創(chuàng)建 elasticsearc 模板(_template)，使用命令：

curl -XPUT localhost:9200/_template/dmp_down_result -d '

{

"template" : "dmp_down_*", #定義模板名字,以dmp_down_ 開始的索引將使用此模板

"settings": {

"number_of_shards": 14, #設置分片數(shù)量

"number_of_replicas": 1, #設置副本數(shù)

"index.refresh_interval": "30s" #刷新間隔(可不設置)

},

"aliases" : {

"dmp_down_result" : {} #別名

},

"mappings" : {

"dmp_es_result1":{ #索引中的type名字,需要與被建索引一致

"properties": #具體字段映射設置

{"user_id": { #hive數(shù)據(jù)中的字段,必須與此對應

"type" : "multi_field", #類型多媒體

"fields" : {

"user_id" : {"type" : "string", "index" : "not_analyzed"} , #類型 string,not_analyzed:為不使用分詞,使用分詞為:analyzed

}

},

"phone": {

"type" : "multi_field",

"fields" : {

"imei" : {"type" : "string", "index" : "not_analyzed"}

}

},

"address": {

"type" : "multi_field",

"fields" : {

"idfa" : {"type" : "string", "index" : "not_analyzed"}

}

}

}

}}

}

}

} '

創(chuàng)建完模板后,可通過http://localhost:9200/_template 來查看是否生效

由此完成創(chuàng)建操作.下附圖

elasticsearch-hadoop hive導入數(shù)據(jù)怎么實現(xiàn)不自動分詞

導入數(shù)據(jù)如報:maybe it contains illegal characters? 此錯誤時,在導入時是無法排查的,可通過手動創(chuàng)建索引/type 方式來查看具體錯誤信息.如:{"error":{"root_cause":[{"type":"remote_transport_exception","reason":"[dmp_es-16][10.8.1.16:9300][indices:admin/create]"}],"type":"illegal_state_exception","reason":"index and alias names need to be unique, but alias [dmp_keyword_result] and index [dmp_keyword_result] have the same name"},"status":500} 這樣可以更直觀的看出問題.索引名重復,重新啟一個索引名字就可以了.

附:es-hadoop hive數(shù)據(jù)同步方法:

下載eslaticsearch-hadoop jar包,需要與當前elasticsearch 版本相對應
將eslaticsearch-hadoop jar上傳到集群
在hive命令行中執(zhí)行:add jar /home/hdroot/ elasticsearch-hadoop-2.2.0.jar
建表命令:

CREATE EXTERNAL TABLE dmp_es_result2(

user_id string,

imei string,

idfa string,

email string,

type_id array<string>,

province string,

region string,

dt string,

terminal_brand string,

system string)

STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'

TBLPROPERTIES('es.resource' = '索引名/類型',

'es.index.auto.create' = 'true',

'es.nodes'='localhost',

'es.port' = '9200',

'es.field.read.empty.as.null' ='true');

es.resource:指定同步到es 中的索引名/類型名

es.index.auto.create:是否使用主動主鍵,如不使用可指定es.mapping.id=主鍵

es.nodes:es集群節(jié)點地址,任意一個節(jié)點就可以,多個可以用逗號(,)分開如:192.168.1.1:9200,192.168.1.2:9200

es.port:es集群端口號,如es.nodes指定了多個,此可以不使用

es.field.read.empty.as.null:對空、null字段的處理方式。加上此參數(shù)可以使數(shù)據(jù)導入更準備（此處有點不太確定）

導入數(shù)據(jù)hive語句:

INSERT OVERWRITE TABLE dmp_es_result2 select user_id,imei,idfa,email,type_id,province,region,dt,terminal_brand,system from temp_zy_game_result01;

如果執(zhí)行成功,就可以看到es中的數(shù)據(jù)同步過去了.

感謝各位的閱讀，以上就是“elasticsearch-hadoop hive導入數(shù)據(jù)怎么實現(xiàn)不自動分詞”的內(nèi)容了，經(jīng)過本文的學習后，相信大家對elasticsearch-hadoop hive導入數(shù)據(jù)怎么實現(xiàn)不自動分詞這一問題有了更深刻的體會，具體使用情況還需要大家實踐驗證。這里是億速云，小編將為大家推送更多相關知識點的文章，歡迎關注！

向AI問一下細節(jié)

推薦閱讀：

免責聲明：本站發(fā)布的內(nèi)容（圖片、視頻和文字）以原創(chuàng)、轉(zhuǎn)載和分享為主，文章觀點不代表本網(wǎng)站立場，如果涉及侵權請聯(lián)系站長郵箱：is@yisu.com進行舉報，并提供相關證據(jù)，一經(jīng)查實，將立刻刪除涉嫌侵權內(nèi)容。

上一篇新聞：
樹莓派4B上怎樣安裝VSCode
下一篇新聞：
jdk安裝以及版本問題怎么解決

猜你喜歡

AI
助
手

產(chǎn)品服務

地區(qū)劃分

專題活動

幫助支持

關于我們

售后咨詢

7*24小時在線電話：400-100-2938

7*24小時在線 QQ：800811969

關注億速云

億速云公眾號

手機網(wǎng)站二維碼

<samp id="egimp"><abbr id="egimp"></abbr></samp>

<strong id="egimp"><sup id="egimp"><dl id="egimp"></dl></sup></strong>