Flink流計(jì)算常用算子是什么

發(fā)布時(shí)間：2021-12-31 14:24:02 來源：億速云閱讀：168 作者：iii 欄目：大數(shù)據(jù)

這篇文章主要介紹“Flink流計(jì)算常用算子是什么”，在日常操作中，相信很多人在Flink流計(jì)算常用算子是什么問題上存在疑惑，小編查閱了各式資料，整理出簡(jiǎn)單好用的操作方法，希望對(duì)大家解答”Flink流計(jì)算常用算子是什么”的疑惑有所幫助！接下來，請(qǐng)跟著小編一起來學(xué)習(xí)吧！

Flink和Spark類似，也是一種一站式處理的框架；既可以進(jìn)行批處理（DataSet），也可以進(jìn)行實(shí)時(shí)處理（DataStream）。

所以下面將Flink的算子分為兩大類：一類是DataSet，一類是DataStream。

DataSet

一、Source算子

1. fromCollection

fromCollection：從本地集合讀取數(shù)據(jù)

例：

val env = ExecutionEnvironment.getExecutionEnvironment
val textDataSet: DataSet[String] = env.fromCollection(
  List("1,張三", "2,李四", "3,王五", "4,趙六")
)

2. readTextFile

readTextFile：從文件中讀?。?/p>

val textDataSet: DataSet[String]  = env.readTextFile("/data/a.txt")

3. readTextFile：遍歷目錄

readTextFile可以對(duì)一個(gè)文件目錄內(nèi)的所有文件，包括所有子目錄中的所有文件的遍歷訪問方式：

val parameters = new Configuration
// recursive.file.enumeration 開啟遞歸
parameters.setBoolean("recursive.file.enumeration", true)
val file = env.readTextFile("/data").withParameters(parameters)

4. readTextFile：讀取壓縮文件

對(duì)于以下壓縮類型，不需要指定任何額外的inputformat方法，flink可以自動(dòng)識(shí)別并且解壓。但是，壓縮文件可能不會(huì)并行讀取，可能是順序讀取的，這樣可能會(huì)影響作業(yè)的可伸縮性。

壓縮方法	文件擴(kuò)展名	是否可并行讀取
DEFLATE	.deflate	no
GZip	.gz .gzip	no
Bzip2	.bz2	no
XZ	.xz	no

val file = env.readTextFile("/data/file.gz")

二、Transform轉(zhuǎn)換算子

因?yàn)門ransform算子基于Source算子操作，所以首先構(gòu)建Flink執(zhí)行環(huán)境及Source算子，后續(xù)Transform算子操作基于此：

val env = ExecutionEnvironment.getExecutionEnvironment
val textDataSet: DataSet[String] = env.fromCollection(
  List("張三,1", "李四,2", "王五,3", "張三,4")
)

1. map

將DataSet中的每一個(gè)元素轉(zhuǎn)換為另外一個(gè)元素：

// 使用map將List轉(zhuǎn)換為一個(gè)Scala的樣例類

case class User(name: String, id: String)

val userDataSet: DataSet[User] = textDataSet.map {
  text =>
    val fieldArr = text.split(",")
    User(fieldArr(0), fieldArr(1))
}
userDataSet.print()

2. flatMap

將DataSet中的每一個(gè)元素轉(zhuǎn)換為0...n個(gè)元素：

// 使用flatMap操作，將集合中的數(shù)據(jù)：
// 根據(jù)第一個(gè)元素，進(jìn)行分組
// 根據(jù)第二個(gè)元素，進(jìn)行聚合求值 

val result = textDataSet.flatMap(line => line)
      .groupBy(0) // 根據(jù)第一個(gè)元素，進(jìn)行分組
      .sum(1) // 根據(jù)第二個(gè)元素，進(jìn)行聚合求值
      
result.print()

3. mapPartition

將一個(gè)分區(qū)中的元素轉(zhuǎn)換為另一個(gè)元素：

// 使用mapPartition操作，將List轉(zhuǎn)換為一個(gè)scala的樣例類

case class User(name: String, id: String)

val result: DataSet[User] = textDataSet.mapPartition(line => {
      line.map(index => User(index._1, index._2))
    })
    
result.print()

4. filter

過濾出來一些符合條件的元素，返回boolean值為true的元素：

val source: DataSet[String] = env.fromElements("java", "scala", "java")
val filter:DataSet[String] = source.filter(line => line.contains("java"))//過濾出帶java的數(shù)據(jù)
filter.print()

5. reduce

可以對(duì)一個(gè)dataset或者一個(gè)group來進(jìn)行聚合計(jì)算，最終聚合成一個(gè)元素：

// 使用 fromElements 構(gòu)建數(shù)據(jù)源
val source = env.fromElements(("java", 1), ("scala", 1), ("java", 1))
// 使用map轉(zhuǎn)換成DataSet元組
val mapData: DataSet[(String, Int)] = source.map(line => line)
// 根據(jù)首個(gè)元素分組
val groupData = mapData.groupBy(_._1)
// 使用reduce聚合
val reduceData = groupData.reduce((x, y) => (x._1, x._2 + y._2))
// 打印測(cè)試
reduceData.print()

6. reduceGroup

將一個(gè)dataset或者一個(gè)group聚合成一個(gè)或多個(gè)元素。
reduceGroup是reduce的一種優(yōu)化方案；
它會(huì)先分組reduce，然后在做整體的reduce；這樣做的好處就是可以減少網(wǎng)絡(luò)IO：

// 使用 fromElements 構(gòu)建數(shù)據(jù)源
val source: DataSet[(String, Int)] = env.fromElements(("java", 1), ("scala", 1), ("java", 1))
// 根據(jù)首個(gè)元素分組
val groupData = source.groupBy(_._1)
// 使用reduceGroup聚合
val result: DataSet[(String, Int)] = groupData.reduceGroup {
      (in: Iterator[(String, Int)], out: Collector[(String, Int)]) =>
        val tuple = in.reduce((x, y) => (x._1, x._2 + y._2))
        out.collect(tuple)
    }
// 打印測(cè)試
result.print()

7. minBy和maxBy

選擇具有最小值或最大值的元素：

// 使用minBy操作，求List中每個(gè)人的最小值
// List("張三,1", "李四,2", "王五,3", "張三,4")

case class User(name: String, id: String)
// 將List轉(zhuǎn)換為一個(gè)scala的樣例類
val text: DataSet[User] = textDataSet.mapPartition(line => {
      line.map(index => User(index._1, index._2))
    })
    
val result = text
          .groupBy(0) // 按照姓名分組
          .minBy(1)   // 每個(gè)人的最小值

8. Aggregate

在數(shù)據(jù)集上進(jìn)行聚合求最值（最大值、最小值）：

val data = new mutable.MutableList[(Int, String, Double)]
    data.+=((1, "yuwen", 89.0))
    data.+=((2, "shuxue", 92.2))
    data.+=((3, "yuwen", 89.99))
// 使用 fromElements 構(gòu)建數(shù)據(jù)源
val input: DataSet[(Int, String, Double)] = env.fromCollection(data)
// 使用group執(zhí)行分組操作
val value = input.groupBy(1)
            // 使用aggregate求最大值元素
            .aggregate(Aggregations.MAX, 2) 
// 打印測(cè)試
value.print()

Aggregate只能作用于元組上

注意：
要使用aggregate，只能使用字段索引名或索引名稱來進(jìn)行分組 groupBy(0) ，否則會(huì)報(bào)一下錯(cuò)誤:
Exception in thread "main" java.lang.UnsupportedOperationException: Aggregate does not support grouping with KeySelector functions, yet.

9. distinct

去除重復(fù)的數(shù)據(jù)：

// 數(shù)據(jù)源使用上一題的
// 使用distinct操作，根據(jù)科目去除集合中重復(fù)的元組數(shù)據(jù)

val value: DataSet[(Int, String, Double)] = input.distinct(1)
value.print()

10. first

取前N個(gè)數(shù)：

input.first(2) // 取前兩個(gè)數(shù)

11. join

將兩個(gè)DataSet按照一定條件連接到一起，形成新的DataSet：

// s1 和 s2 數(shù)據(jù)集格式如下：
// DataSet[(Int, String,String, Double)]

 val joinData = s1.join(s2)  // s1數(shù)據(jù)集 join s2數(shù)據(jù)集
             .where(0).equalTo(0) {     // join的條件
      (s1, s2) => (s1._1, s1._2, s2._2, s1._3)
    }

12. leftOuterJoin

左外連接,左邊的Dataset中的每一個(gè)元素，去連接右邊的元素

此外還有：

rightOuterJoin：右外連接,左邊的Dataset中的每一個(gè)元素，去連接左邊的元素

fullOuterJoin：全外連接,左右兩邊的元素，全部連接

下面以 leftOuterJoin 進(jìn)行示例：

 val data1 = ListBuffer[Tuple2[Int,String]]()
    data1.append((1,"zhangsan"))
    data1.append((2,"lisi"))
    data1.append((3,"wangwu"))
    data1.append((4,"zhaoliu"))

val data2 = ListBuffer[Tuple2[Int,String]]()
    data2.append((1,"beijing"))
    data2.append((2,"shanghai"))
    data2.append((4,"guangzhou"))

val text1 = env.fromCollection(data1)
val text2 = env.fromCollection(data2)

text1.leftOuterJoin(text2).where(0).equalTo(0).apply((first,second)=>{
      if(second==null){
        (first._1,first._2,"null")
      }else{
        (first._1,first._2,second._2)
      }
    }).print()

13. cross

交叉操作，通過形成這個(gè)數(shù)據(jù)集和其他數(shù)據(jù)集的笛卡爾積，創(chuàng)建一個(gè)新的數(shù)據(jù)集

和join類似，但是這種交叉操作會(huì)產(chǎn)生笛卡爾積，在數(shù)據(jù)比較大的時(shí)候，是非常消耗內(nèi)存的操作：

val cross = input1.cross(input2){
      (input1 , input2) => (input1._1,input1._2,input1._3,input2._2)
    }

cross.print()

14. union

聯(lián)合操作，創(chuàng)建包含來自該數(shù)據(jù)集和其他數(shù)據(jù)集的元素的新數(shù)據(jù)集,不會(huì)去重：

val unionData: DataSet[String] = elements1.union(elements2).union(elements3)
// 去除重復(fù)數(shù)據(jù)
val value = unionData.distinct(line => line)

15. rebalance

Flink也有數(shù)據(jù)傾斜的時(shí)候，比如當(dāng)前有數(shù)據(jù)量大概10億條數(shù)據(jù)需要處理，在處理過程中可能會(huì)發(fā)生如圖所示的狀況：

這個(gè)時(shí)候本來總體數(shù)據(jù)量只需要10分鐘解決的問題，出現(xiàn)了數(shù)據(jù)傾斜，機(jī)器1上的任務(wù)需要4個(gè)小時(shí)才能完成，那么其他3臺(tái)機(jī)器執(zhí)行完畢也要等待機(jī)器1執(zhí)行完畢后才算整體將任務(wù)完成；所以在實(shí)際的工作中，出現(xiàn)這種情況比較好的解決方案就是接下來要介紹的—rebalance（內(nèi)部使用round robin方法將數(shù)據(jù)均勻打散。這對(duì)于數(shù)據(jù)傾斜時(shí)是很好的選擇。）

// 使用rebalance操作，避免數(shù)據(jù)傾斜
val rebalance = filterData.rebalance()

16. partitionByHash

按照指定的key進(jìn)行hash分區(qū)：

val data = new mutable.MutableList[(Int, Long, String)]
data.+=((1, 1L, "Hi"))
data.+=((2, 2L, "Hello"))
data.+=((3, 2L, "Hello world"))

val collection = env.fromCollection(data)
val unique = collection.partitionByHash(1).mapPartition{
  line =>
    line.map(x => (x._1 , x._2 , x._3))
}

unique.writeAsText("hashPartition", WriteMode.NO_OVERWRITE)
env.execute()

17. partitionByRange

根據(jù)指定的key對(duì)數(shù)據(jù)集進(jìn)行范圍分區(qū)：

val data = new mutable.MutableList[(Int, Long, String)]
data.+=((1, 1L, "Hi"))
data.+=((2, 2L, "Hello"))
data.+=((3, 2L, "Hello world"))
data.+=((4, 3L, "Hello world, how are you?"))

val collection = env.fromCollection(data)
val unique = collection.partitionByRange(x => x._1).mapPartition(line => line.map{
  x=>
    (x._1 , x._2 , x._3)
})
unique.writeAsText("rangePartition", WriteMode.OVERWRITE)
env.execute()

18. sortPartition

根據(jù)指定的字段值進(jìn)行分區(qū)的排序：

val data = new mutable.MutableList[(Int, Long, String)]
    data.+=((1, 1L, "Hi"))
    data.+=((2, 2L, "Hello"))
    data.+=((3, 2L, "Hello world"))
    data.+=((4, 3L, "Hello world, how are you?"))

val ds = env.fromCollection(data)
    val result = ds
      .map { x => x }.setParallelism(2)
      .sortPartition(1, Order.DESCENDING)//第一個(gè)參數(shù)代表按照哪個(gè)字段進(jìn)行分區(qū)
      .mapPartition(line => line)
      .collect()

println(result)

三、Sink算子

1. collect

將數(shù)據(jù)輸出到本地集合：

result.collect()

2. writeAsText

將數(shù)據(jù)輸出到文件

Flink支持多種存儲(chǔ)設(shè)備上的文件，包括本地文件，hdfs文件等

Flink支持多種文件的存儲(chǔ)格式，包括text文件，CSV文件等

// 將數(shù)據(jù)寫入本地文件
result.writeAsText("/data/a", WriteMode.OVERWRITE)

// 將數(shù)據(jù)寫入HDFS
result.writeAsText("hdfs://node01:9000/data/a", WriteMode.OVERWRITE)

DataStream

和DataSet一樣，DataStream也包括一系列的Transformation操作。

一、Source算子

Flink可以使用 StreamExecutionEnvironment.addSource(source) 來為我們的程序添加數(shù)據(jù)來源。
Flink 已經(jīng)提供了若干實(shí)現(xiàn)好了的 source functions，當(dāng)然我們也可以通過實(shí)現(xiàn) SourceFunction 來自定義非并行的source或者實(shí)現(xiàn) ParallelSourceFunction 接口或者擴(kuò)展 RichParallelSourceFunction 來自定義并行的 source。

Flink在流處理上的source和在批處理上的source基本一致。大致有4大類：

基于 本地集合的source（Collection-based-source）
基于文件的source（File-based-source）- 讀取文本文件，即符合 TextInputFormat 規(guī)范的文件，并將其作為字符串返回
基于 網(wǎng)絡(luò)套接字的source（Socket-based-source）- 從 socket 讀取。元素可以用分隔符切分。
自定義的source（Custom-source）

下面使用addSource將Kafka數(shù)據(jù)寫入Flink為例：

如果需要外部數(shù)據(jù)源對(duì)接，可使用addSource，如將Kafka數(shù)據(jù)寫入Flink，先引入依賴：

<!-- https://mvnrepository.com/artifact/org.apache.flink/flink-connector-kafka-0.11 -->
<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-connector-kafka-0.11_2.11</artifactId>
    <version>1.10.0</version>
</dependency>

將Kafka數(shù)據(jù)寫入Flink：

val properties = new Properties()
properties.setProperty("bootstrap.servers", "localhost:9092")
properties.setProperty("group.id", "consumer-group")
properties.setProperty("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
properties.setProperty("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
properties.setProperty("auto.offset.reset", "latest")

val source = env.addSource(new FlinkKafkaConsumer011[String]("sensor", new SimpleStringSchema(), properties))

基于網(wǎng)絡(luò)套接字的：

val source = env.socketTextStream("IP", PORT)

二、Transform轉(zhuǎn)換算子

1. map

將DataSet中的每一個(gè)元素轉(zhuǎn)換為另外一個(gè)元素：

dataStream.map { x => x * 2 }

2. FlatMap

采用一個(gè)數(shù)據(jù)元并生成零個(gè)，一個(gè)或多個(gè)數(shù)據(jù)元。將句子分割為單詞的flatmap函數(shù)：

dataStream.flatMap { str => str.split(" ") }

3. Filter

計(jì)算每個(gè)數(shù)據(jù)元的布爾函數(shù)，并保存函數(shù)返回true的數(shù)據(jù)元。過濾掉零值的過濾器：

dataStream.filter { _ != 0 }

4. KeyBy

邏輯上將流分區(qū)為不相交的分區(qū)。具有相同Keys的所有記錄都分配給同一分區(qū)。在內(nèi)部，keyBy（）是使用散列分區(qū)實(shí)現(xiàn)的。指定鍵有不同的方法。

此轉(zhuǎn)換返回KeyedStream，其中包括使用被Keys化狀態(tài)所需的KeyedStream：

dataStream.keyBy(0)

5. Reduce

被Keys化數(shù)據(jù)流上的“滾動(dòng)”Reduce。將當(dāng)前數(shù)據(jù)元與最后一個(gè)Reduce的值組合并發(fā)出新值：

keyedStream.reduce { _ + _ }

6. Fold

具有初始值的被Keys化數(shù)據(jù)流上的“滾動(dòng)”折疊。將當(dāng)前數(shù)據(jù)元與最后折疊的值組合并發(fā)出新值：

val result: DataStream[String] =  keyedStream.fold("start")((str, i) => { str + "-" + i }) 

// 解釋：當(dāng)上述代碼應(yīng)用于序列（1,2,3,4,5）時(shí)，輸出結(jié)果“start-1”，“start-1-2”，“start-1-2-3”，...

7. Aggregations

在被Keys化數(shù)據(jù)流上滾動(dòng)聚合。min和minBy之間的差異是min返回最小值，而minBy返回該字段中具有最小值的數(shù)據(jù)元（max和maxBy相同）：

keyedStream.sum(0);

keyedStream.min(0);

keyedStream.max(0);

keyedStream.minBy(0);

keyedStream.maxBy(0);

8. Window

可以在已經(jīng)分區(qū)的KeyedStream上定義Windows。Windows根據(jù)某些特征（例如，在最后5秒內(nèi)到達(dá)的數(shù)據(jù)）對(duì)每個(gè)Keys中的數(shù)據(jù)進(jìn)行分組。這里不再對(duì)窗口進(jìn)行詳解，有關(guān)窗口的完整說明，請(qǐng)查看這篇文章：Flink 中極其重要的 Time 與 Window 詳細(xì)解析

dataStream.keyBy(0).window(TumblingEventTimeWindows.of(Time.seconds(5)));

9. WindowAll

Windows可以在常規(guī)DataStream上定義。Windows根據(jù)某些特征（例如，在最后5秒內(nèi)到達(dá)的數(shù)據(jù)）對(duì)所有流事件進(jìn)行分組。

注意：在許多情況下，這是非并行轉(zhuǎn)換。所有記錄將收集在windowAll 算子的一個(gè)任務(wù)中。

dataStream.windowAll(TumblingEventTimeWindows.of(Time.seconds(5)))

10. Window Apply

將一般函數(shù)應(yīng)用于整個(gè)窗口。

注意：如果您正在使用windowAll轉(zhuǎn)換，則需要使用AllWindowFunction。

下面是一個(gè)手動(dòng)求和窗口數(shù)據(jù)元的函數(shù)：

windowedStream.apply { WindowFunction }

allWindowedStream.apply { AllWindowFunction }

11. Window Reduce

將函數(shù)縮減函數(shù)應(yīng)用于窗口并返回縮小的值：

windowedStream.reduce { _ + _ }

12. Window Fold

將函數(shù)折疊函數(shù)應(yīng)用于窗口并返回折疊值：

val result: DataStream[String] = windowedStream.fold("start", (str, i) => { str + "-" + i }) 

// 上述代碼應(yīng)用于序列（1,2,3,4,5）時(shí)，將序列折疊為字符串“start-1-2-3-4-5”

13. Union

兩個(gè)或多個(gè)數(shù)據(jù)流的聯(lián)合，創(chuàng)建包含來自所有流的所有數(shù)據(jù)元的新流。注意：如果將數(shù)據(jù)流與自身聯(lián)合，則會(huì)在結(jié)果流中獲取兩次數(shù)據(jù)元：

dataStream.union(otherStream1, otherStream2, ...)

14. Window Join

在給定Keys和公共窗口上連接兩個(gè)數(shù)據(jù)流：

dataStream.join(otherStream)
    .where(<key selector>).equalTo(<key selector>)
    .window(TumblingEventTimeWindows.of(Time.seconds(3)))
    .apply (new JoinFunction () {...})

15. Interval Join

在給定的時(shí)間間隔內(nèi)使用公共Keys關(guān)聯(lián)兩個(gè)被Key化的數(shù)據(jù)流的兩個(gè)數(shù)據(jù)元e1和e2，以便e1.timestamp + lowerBound <= e2.timestamp <= e1.timestamp + upperBound

am.intervalJoin(otherKeyedStream)
    .between(Time.milliseconds(-2), Time.milliseconds(2)) 
    .upperBoundExclusive(true) 
    .lowerBoundExclusive(true) 
    .process(new IntervalJoinFunction() {...})

16. Window CoGroup

在給定Keys和公共窗口上對(duì)兩個(gè)數(shù)據(jù)流進(jìn)行Cogroup：

dataStream.coGroup(otherStream)
    .where(0).equalTo(1)
    .window(TumblingEventTimeWindows.of(Time.seconds(3)))
    .apply (new CoGroupFunction () {...})

17. Connect

“連接”兩個(gè)保存其類型的數(shù)據(jù)流。連接允許兩個(gè)流之間的共享狀態(tài)：

DataStream<Integer> someStream = ... DataStream<String> otherStream = ... ConnectedStreams<Integer, String> connectedStreams = someStream.connect(otherStream)

// ... 代表省略中間操作

18. CoMap，CoFlatMap

類似于連接數(shù)據(jù)流上的map和flatMap：

connectedStreams.map(
    (_ : Int) => true,
    (_ : String) => false)connectedStreams.flatMap(
    (_ : Int) => true,
    (_ : String) => false)

19. Split

根據(jù)某些標(biāo)準(zhǔn)將流拆分為兩個(gè)或更多個(gè)流：

val split = someDataStream.split(
  (num: Int) =>
    (num % 2) match {
      case 0 => List("even")
      case 1 => List("odd")
    })

20. Select

從拆分流中選擇一個(gè)或多個(gè)流：

SplitStream<Integer> split;DataStream<Integer> even = split.select("even");DataStream<Integer> odd = split.select("odd");DataStream<Integer> all = split.select("even","odd")

三、Sink算子

支持將數(shù)據(jù)輸出到：

本地文件(參考批處理)
本地集合(參考批處理)
HDFS(參考批處理)

除此之外，還支持：

sink到kafka
sink到mysql
sink到redis

下面以sink到kafka為例：

val sinkTopic = "test"

//樣例類
case class Student(id: Int, name: String, addr: String, sex: String)
val mapper: ObjectMapper = new ObjectMapper()

//將對(duì)象轉(zhuǎn)換成字符串
def toJsonString(T: Object): String = {
    mapper.registerModule(DefaultScalaModule)
    mapper.writeValueAsString(T)
}

def main(args: Array[String]): Unit = {
    //1.創(chuàng)建流執(zhí)行環(huán)境
    val env = StreamExecutionEnvironment.getExecutionEnvironment
    //2.準(zhǔn)備數(shù)據(jù)
    val dataStream: DataStream[Student] = env.fromElements(
      Student(8, "xiaoming", "beijing biejing", "female")
    )
    //將student轉(zhuǎn)換成字符串
    val studentStream: DataStream[String] = dataStream.map(student =>
      toJsonString(student) // 這里需要顯示SerializerFeature中的某一個(gè)，否則會(huì)報(bào)同時(shí)匹配兩個(gè)方法的錯(cuò)誤
    )
    //studentStream.print()
    val prop = new Properties()
    prop.setProperty("bootstrap.servers", "node01:9092")

    val myProducer = new FlinkKafkaProducer011[String](sinkTopic, new KeyedSerializationSchemaWrapper[String](new SimpleStringSchema()), prop)
    studentStream.addSink(myProducer)
    studentStream.print()
    env.execute("Flink add sink")
}

到此，關(guān)于“Flink流計(jì)算常用算子是什么”的學(xué)習(xí)就結(jié)束了，希望能夠解決大家的疑惑。理論與實(shí)踐的搭配能更好的幫助大家學(xué)習(xí)，快去試試吧！若想繼續(xù)學(xué)習(xí)更多相關(guān)知識(shí)，請(qǐng)繼續(xù)關(guān)注億速云網(wǎng)站，小編會(huì)繼續(xù)努力為大家?guī)砀鄬?shí)用的文章！

向AI問一下細(xì)節(jié)

Flink流計(jì)算常用算子是什么

DataSet

一、Source算子

1. fromCollection

2. readTextFile

3. readTextFile：遍歷目錄

4. readTextFile：讀取壓縮文件

二、Transform轉(zhuǎn)換算子

1. map

2. flatMap

3. mapPartition

4. filter

5. reduce

6. reduceGroup

7. minBy和maxBy

8. Aggregate

9. distinct

10. first

11. join

12. leftOuterJoin

13. cross

14. union

15. rebalance

16. partitionByHash

17. partitionByRange

18. sortPartition

三、Sink算子

1. collect

2. writeAsText

DataStream

一、Source算子

二、Transform轉(zhuǎn)換算子

1. map

2. FlatMap

3. Filter

4. KeyBy

5. Reduce

6. Fold

7. Aggregations

8. Window

9. WindowAll

10. Window Apply

11. Window Reduce

12. Window Fold

13. Union

14. Window Join

15. Interval Join

16. Window CoGroup

17. Connect

18. CoMap，CoFlatMap

19. Split

20. Select

三、Sink算子

猜你喜歡

最新資訊

相關(guān)推薦

相關(guān)標(biāo)簽

一、Source算子

三、Sink算子

二、Transform轉(zhuǎn)換算子

18. CoMap，CoFlatMap

三、Sink算子