溫馨提示×

溫馨提示×

您好,登錄后才能下訂單哦!

密碼登錄×
登錄注冊×
其他方式登錄
點(diǎn)擊 登錄注冊 即表示同意《億速云用戶服務(wù)條款》

怎么聯(lián)合使用Spark Streaming、Broadcast、Accumulaor

發(fā)布時(shí)間:2021-12-16 15:22:40 來源:億速云 閱讀:165 作者:iii 欄目:云計(jì)算

本篇內(nèi)容介紹了“怎么聯(lián)合使用Spark Streaming、Broadcast、Accumulaor”的有關(guān)知識,在實(shí)際案例的操作過程中,不少人都會遇到這樣的困境,接下來就讓小編帶領(lǐng)大家學(xué)習(xí)一下如何處理這些情況吧!希望大家仔細(xì)閱讀,能夠?qū)W有所成!

廣播可以自定義,通過Broadcast、Accumulator聯(lián)合可以完成復(fù)雜的業(yè)務(wù)邏輯。

以下代碼實(shí)現(xiàn)在本機(jī)9999端口監(jiān)聽,并向連接上的客戶端發(fā)送單詞,其中包含黑名單的單詞Hadoop,Mahout和Hive。

package org.scala.opt

import java.io.{PrintWriter,  IOException}
import java.net.{Socket, SocketException, ServerSocket}

 

case class ServerThread(socket : Socket) extends Thread("ServerThread") {
  override def run(): Unit = {
    val ptWriter = new PrintWriter(socket.getOutputStream)
    try {
      var count = 0
      var totalCount = 0
      var isThreadRunning : Boolean = true
      val batchCount = 1
      val words = List("Java Scala C C++ C# Python JavaScript",
      "Hadoop Spark Ngix MFC Net Mahout Hive")
      while (isThreadRunning) {
        words.foreach(ptWriter.println)
        count += 1
        if (count >= batchCount) {
          totalCount += count
          count = 0
          println("batch " + batchCount + " totalCount => " + totalCount)
          Thread.sleep(1000)
        }
        //out.println此類中的方法不會拋出 I/O 異常,盡管其某些構(gòu)造方法可能拋出異常。客戶端可能會查詢調(diào)用 checkError() 是否出現(xiàn)錯(cuò)誤。
        if(ptWriter.checkError()) {
          isThreadRunning = false
          println("ptWriter error then close socket")
        }
      }
    }
    catch {
      case e : SocketException =>
        println("SocketException : ", e)
      case e : IOException =>
        e.printStackTrace();
    } finally {
      if (ptWriter != null) ptWriter.close()
      println("Client " + socket.getInetAddress + " disconnected")
      if (socket != null) socket.close()
    }
    println(Thread.currentThread().getName + " Exit")
  }
}
object SocketServer {
  def main(args : Array[String]) : Unit = {
    try {
      val listener = new ServerSocket(9999)
      println("Server is started, waiting for client connect...")
      while (true) {
        val socket = listener.accept()
        println("Client : " + socket.getLocalAddress + " connected")
        new ServerThread(socket).start()
      }
      listener.close()
    }
    catch {
      case e: IOException =>
        System.err.println("Could not listen on port: 9999.")
        System.exit(-1)
    }
  }
}

以下代碼實(shí)現(xiàn)接收本機(jī)9999端口發(fā)送的單詞,統(tǒng)計(jì)黑名單出現(xiàn)的次數(shù)的功能。

package com.dt.spark.streaming_scala

import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.{SparkConf, Accumulator}
import org.apache.spark.broadcast.Broadcast

/**
 * 第103課:  動手實(shí)戰(zhàn)聯(lián)合使用Spark Streaming、Broadcast、Accumulator實(shí)現(xiàn)在線黑名單過濾和計(jì)數(shù)
 * 本期內(nèi)容:
1,Spark Streaming與Broadcast、Accumulator聯(lián)合
2,在線黑名單過濾和計(jì)算實(shí)戰(zhàn)
 */
object _103SparkStreamingBroadcastAccumulator {

  @volatile private var broadcastList : Broadcast[List[String]] = null
  @volatile private var accumulator : Accumulator[Int] = null

  def main(args : Array[String]) : Unit = {
    val conf = new SparkConf().setMaster("local[5]").setAppName("_103SparkStreamingBroadcastAccumulator")
    val ssc = new StreamingContext(conf, Seconds(5))
    ssc.sparkContext.setLogLevel("WARN")

    /**
     * 使用Broadcast廣播黑名單到每個(gè)Executor中
     */
    broadcastList = ssc.sparkContext.broadcast(Array("Hadoop", "Mahout", "Hive").toList)

    /**
     * 全局計(jì)數(shù)器,用于通知在線過濾了多少各黑名單
     */
    accumulator = ssc.sparkContext.accumulator(0, "OnlineBlackListCounter")

    ssc.socketTextStream("localhost", 9999).flatMap(_.split(" ")).map((_, 1)).reduceByKey(_+_).foreachRDD {rdd =>{
      if (!rdd.isEmpty()) {
        rdd.filter(wordPair => {
          if (broadcastList.value.contains(wordPair._1)) {

            println("BlackList word %s appeared".formatted(wordPair._1))
            accumulator.add(wordPair._2)
            false
          } else {
            true
          }
        }).collect()
        println("BlackList appeared : %d times".format(accumulator.value))
      }
    }}
    ssc.start()
    ssc.awaitTermination()
    ssc.stop()
  }
}

Server發(fā)送端日志如下,不斷打印輸出的次數(shù)。

 怎么聯(lián)合使用Spark Streaming、Broadcast、Accumulaor

Spark Streaming端打印黑名單的單詞及出現(xiàn)的次數(shù)。

 怎么聯(lián)合使用Spark Streaming、Broadcast、Accumulaor

“怎么聯(lián)合使用Spark Streaming、Broadcast、Accumulaor”的內(nèi)容就介紹到這里了,感謝大家的閱讀。如果想了解更多行業(yè)相關(guān)的知識可以關(guān)注億速云網(wǎng)站,小編將為大家輸出更多高質(zhì)量的實(shí)用文章!

向AI問一下細(xì)節(jié)

免責(zé)聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點(diǎn)不代表本網(wǎng)站立場,如果涉及侵權(quán)請聯(lián)系站長郵箱:is@yisu.com進(jìn)行舉報(bào),并提供相關(guān)證據(jù),一經(jīng)查實(shí),將立刻刪除涉嫌侵權(quán)內(nèi)容。

AI