<tt id="2rlqh"></tt>

溫馨提示×

溫馨提示×

您好，登錄后才能下訂單哦！

密碼登錄×

忘記密碼？

登錄注冊×

獲取短信驗證碼

其他方式登錄

點擊登錄注冊即表示同意《億速云用戶服務(wù)條款》

用戶登錄×

賬戶密碼登錄

請使用微信掃描上方二維碼

使用幫助

請求超時！

請點擊重新獲取二維碼

Apache Flume正則過濾器怎么運行

發(fā)布時間：2021-12-16 10:41:57 來源：億速云閱讀：156 作者：iii 欄目：大數(shù)據(jù)

這篇文章主要介紹“Apache Flume正則過濾器怎么運行”，在日常操作中，相信很多人在Apache Flume正則過濾器怎么運行問題上存在疑惑，小編查閱了各式資料，整理出簡單好用的操作方法，希望對大家解答”Apache Flume正則過濾器怎么運行”的疑惑有所幫助！接下來，請跟著小編一起來學(xué)習(xí)吧！

在當(dāng)今的大數(shù)據(jù)世界中，應(yīng)用程序產(chǎn)生大量的電子數(shù)據(jù) – 這些巨大的電子數(shù)據(jù)存儲庫包含了有價值的、寶貴的信息。對于人類分析師或領(lǐng)域?qū)＜?，很難做出有趣的發(fā)現(xiàn)或?qū)ふ铱梢詭椭鷽Q策過程的模式。我們需要自動化的流程來有效地利用龐大的，信息豐富的數(shù)據(jù)進(jìn)行規(guī)劃和投資決策。在處理數(shù)據(jù)之前，收集數(shù)據(jù)，聚合和轉(zhuǎn)換數(shù)據(jù)是絕對必要的，并最終將數(shù)據(jù)移動到那些使用不同分析和數(shù)據(jù)挖掘工具的存儲庫中。

執(zhí)行所有這些步驟的流行工具之一是Apache Flume。這些數(shù)據(jù)通常是以事件或日志的形式存儲。 Apache Flume有三個主要組件：

Source：數(shù)據(jù)源可以是企業(yè)服務(wù)器，文件系統(tǒng)，云端，數(shù)據(jù)存儲庫等。
Sink：Sink是可以存儲數(shù)據(jù)的目標(biāo)存儲庫。它可以是一個集中的地方，如HDFS，像Apache Spark這樣的處理引擎，或像ElasticSearch這樣的數(shù)據(jù)存儲庫/搜索引擎。
Channel：在事件被sink消耗前由Channel 存儲。 Channel 是被動存儲。 Channel 支持故障恢復(fù)和高可靠性; Channel 示例是由本地文件系統(tǒng)和基于內(nèi)存的Channel 支持的文件通道。

Flume是高度可配置的，并且支持許多源，channel，serializer和sink。它還支持?jǐn)?shù)據(jù)流。 Flume的強(qiáng)大功能是攔截器，支持在運行中修改/刪除事件的功能。支持的攔截器之一是regex_filter。

regex_filter將事件體解釋為文本，并將其與提供的正則表達(dá)式進(jìn)行對比，并基于匹配的模式和表達(dá)式，包括或排除事件。我們將詳細(xì)看看regex_filter。

要求

從數(shù)據(jù)源中，我們以街道號，名稱，城市和角色的形式獲取數(shù)據(jù)。現(xiàn)在，數(shù)據(jù)源可能是實時流數(shù)據(jù)，也可能是任何其他來源。在本示例中，我已經(jīng)使用Netcat服務(wù)作為偵聽給定端口的源，并將每行文本轉(zhuǎn)換為事件。要求以文本格式將數(shù)據(jù)保存到HDFS中。在將數(shù)據(jù)保存到HDFS之前，必須根據(jù)角色對數(shù)據(jù)進(jìn)行過濾。只有經(jīng)理的記錄需要存儲在HDFS中;其他角色的數(shù)據(jù)必須被忽略。例如，允許以下數(shù)據(jù)：

1,alok,mumbai,manager  2,jatin,chennai,manager

下列的數(shù)據(jù)是不被允許的：

3,yogesh,kolkata,developer  5,jyotsana,pune,developer

如何達(dá)到這個要求

可以通過使用 regex_filter 攔截器來實現(xiàn)。這個攔截器將根據(jù)規(guī)則基礎(chǔ)來進(jìn)行事件過濾，只有感興趣的事件才會發(fā)送到對應(yīng)的槽中，同時忽略其他的事件。

## Describe regex_filter interceptor and configure exclude events attribute  a1.sources.r1.interceptors = i1  a1.sources.r1.interceptors.i1.type = regex_filter  a1.sources.r1.interceptors.i1.regex = developer  a1.sources.r1.interceptors.i1.excludeEvents = true

HDFS 槽允許數(shù)據(jù)存儲在 HDFS 中，使用文本/序列格式。也可以使用壓縮格式存儲。

a1.channels = c1  a1.sinks = k1  a1.sinks.k1.type = hdfs  a1.sinks.k1.channel = c1  ## assumption is that Hadoop is CDH  a1.sinks.k1.hdfs.path = hdfs://quickstart.cloudera:8020/user/hive/warehouse/managers  a1.sinks.k1.hdfs.fileType= DataStream  a1.sinks.k1.hdfs.writeFormat = Text

如何運行示例

首先，你需要 Hadoop 來讓示例作為 HDFS 的槽來運行。如果你沒有一個 Hadoop 集群，可以將槽改為日志，然后只需要啟動 Flume。在某個目錄下存儲 regex_filter_flume_conf.conf 文件然后使用如下命令運行代理。

flume-ng agent --conf conf --conf-file regex_filter_flume_conf.conf --name a1 -Dflume.root.logger=INFO,console

注意代理名稱是 a1。我用了 Netcat 這個源。

a1.sources.r1.type = netcat  a1.sources.r1.bind = localhost  a1.sources.r1.port = 44444

一旦 Flume 代理啟動，運行下面命令用來發(fā)送事件給 Flume。

telnet localhost 40000

現(xiàn)在我們只需要提供如下輸入文本：

1,alok,mumbai,manager  2,jatin,chennai,manager  3,yogesh,kolkata,developer  4,ragini,delhi,manager  5,jyotsana,pune,developer  6,valmiki,banglore,manager

訪問 HDFS 你會觀察到 HDFS 在 hdfs://quickstart.cloudera:8020/user/hive/warehouse/managers 下創(chuàng)建了一個文件，文件只包含經(jīng)理的數(shù)據(jù)。

完整的 flume 配置 — regex_filter_flume_conf.conf — 如下：

# Name the components on this agent  a1.sources = r1  a1.sinks = k1  a1.channels = c1  # Describe/configure the source - netcat  a1.sources.r1.type = netcat  a1.sources.r1.bind = localhost  a1.sources.r1.port = 44444  # Describe the HDFS sink  a1.channels = c1  a1.sinks = k1  a1.sinks.k1.type = hdfs  a1.sinks.k1.channel = c1  a1.sinks.k1.hdfs.path = hdfs://quickstart.cloudera:8020/user/hive/warehouse/managers  a1.sinks.k1.hdfs.fileType= DataStream  a1.sinks.k1.hdfs.writeFormat = Text  ## Describe regex_filter interceptor and configure exclude events attribute  a1.sources.r1.interceptors = i1  a1.sources.r1.interceptors.i1.type = regex_filter  a1.sources.r1.interceptors.i1.regex = developer  a1.sources.r1.interceptors.i1.excludeEvents = true  # Use a channel which buffers events in memory  a1.channels.c1.type = memory  a1.channels.c1.capacity = 1000  a1.channels.c1.transactionCapacity = 100  # Bind the source and sink to the channel  a1.sources.r1.channels = c1  a1.sinks.k1.channel = c1

到此，關(guān)于“Apache Flume正則過濾器怎么運行”的學(xué)習(xí)就結(jié)束了，希望能夠解決大家的疑惑。理論與實踐的搭配能更好的幫助大家學(xué)習(xí)，快去試試吧！若想繼續(xù)學(xué)習(xí)更多相關(guān)知識，請繼續(xù)關(guān)注億速云網(wǎng)站，小編會繼續(xù)努力為大家?guī)砀鄬嵱玫奈恼拢?/p>

向AI問一下細(xì)節(jié)

推薦閱讀：

免責(zé)聲明：本站發(fā)布的內(nèi)容（圖片、視頻和文字）以原創(chuàng)、轉(zhuǎn)載和分享為主，文章觀點不代表本網(wǎng)站立場，如果涉及侵權(quán)請聯(lián)系站長郵箱：is@yisu.com進(jìn)行舉報，并提供相關(guān)證據(jù)，一經(jīng)查實，將立刻刪除涉嫌侵權(quán)內(nèi)容。

上一篇新聞：
k8s如何部署分布式j(luò)enkins
下一篇新聞：
Linux?sftp命令的用法是怎樣的

猜你喜歡

AI
助
手

產(chǎn)品服務(wù)

地區(qū)劃分

專題活動

幫助支持

關(guān)于我們

售后咨詢

7*24小時在線電話：400-100-2938

7*24小時在線 QQ：800811969

關(guān)注億速云

億速云公眾號

手機(jī)網(wǎng)站二維碼

<td id="mv8lo"><td id="mv8lo"></td></td>