您好,登錄后才能下訂單哦!
這篇文章主要介紹“MR程序的組件combiner怎么使用”,在日常操作中,相信很多人在MR程序的組件combiner怎么使用問題上存在疑惑,小編查閱了各式資料,整理出簡(jiǎn)單好用的操作方法,希望對(duì)大家解答”MR程序的組件combiner怎么使用”的疑惑有所幫助!接下來,請(qǐng)跟著小編一起來學(xué)習(xí)吧!
用一句簡(jiǎn)單的話語描述combiner組件作用:降低map任務(wù)輸出,減少reduce任務(wù)數(shù)量,從而降低網(wǎng)絡(luò)負(fù)載
工作機(jī)制:
Map任務(wù)允許在提交給Reduce任務(wù)之前在本地執(zhí)行一次匯總的操作,那就是combiner組件,combiner組件的行為模式和Reduce一樣,都是接收key/values,產(chǎn)生key/value輸出
注意:
1、combiner的輸出是reduce的輸入
2、如果combiner是可插拔的 ,那么combiner絕不能改變最終結(jié)果
3、combiner是一個(gè)優(yōu)化組件,但是并不是所有地方都能用到,所以combiner只能用于reduce的輸入、輸出key/value類型完全一致且不影響最終結(jié)果的場(chǎng)景。
例子:WordCount程序中,通過統(tǒng)計(jì)每一個(gè)單詞出現(xiàn)的次數(shù),我們可以首先通過Map任務(wù)本地進(jìn)行一次匯總(Combiner),然后將匯總的結(jié)果交給Reduce,完成各個(gè)Map任務(wù)存在相同KEY的數(shù)據(jù)進(jìn)行一次總的匯總,圖:
Combiner代碼:
Combiner類,直接打開Combiner類源碼是直接繼承Reducer類,所以我們直接繼承Reducer類即可,最終在提交時(shí)指定咱們定義的Combiner類即可
package com.itheima.hadoop.mapreduce.combiner; import java.io.IOException; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer; public class WordCountCombiner extends Reducer<Text, LongWritable, Text, LongWritable> { @Override protected void reduce(Text key, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException { long count = 0 ; for (LongWritable value : values) { count += value.get(); } context.write(key, new LongWritable(count)); } }
Mapper類:
package com.itheima.hadoop.mapreduce.mapper; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; public class WordCountCombinerMapper extends Mapper<LongWritable, Text, Text, LongWritable> { public void map(LongWritable key, Text value, Context context) throws java.io.IOException, InterruptedException { String line = value.toString(); //獲取一行數(shù)據(jù) String[] words = line.split(" "); //獲取各個(gè)單詞 for (String word : words) { // 將每一個(gè)單詞寫出去 context.write(new Text(word), new LongWritable(1)); } } }
驅(qū)動(dòng)類:
package com.itheima.hadoop.drivers; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.util.Tool; import com.itheima.hadoop.mapreduce.combiner.WordCountCombiner; import com.itheima.hadoop.mapreduce.mapper.WordCountCombinerMapper; public class WordCountCombinerDriver extends Configured implements Tool{ @Override public int run(String[] args) throws Exception { /** * 提交五重奏: * 1、產(chǎn)生作業(yè) * 2、指定MAP/REDUCE * 3、指定MAPREDUCE輸出數(shù)據(jù)類型 * 4、指定路徑 * 5、提交作業(yè) */ Configuration conf = new Configuration(); Job job = Job.getInstance(conf); job.setJarByClass(WordCountCombinerDriver.class); job.setMapperClass(WordCountCombinerMapper.class); /***此處中間小插曲:combiner組件***/ job.setCombinerClass(WordCountCombiner.class); /***此處中間小插曲:combiner組件***/ //reduce邏輯和combiner邏輯一致且combiner又是reduce的子類 job.setReducerClass(WordCountCombiner.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(LongWritable.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(LongWritable.class); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); return job.waitForCompletion(true) ? 0 : 1; } }
主類:
package com.itheima.hadoop.runner; import org.apache.hadoop.util.ToolRunner; import com.itheima.hadoop.drivers.WordCountCombinerDriver; public class WordCountCombinerRunner { public static void main(String[] args) throws Exception { int res = ToolRunner.run(new WordCountCombinerDriver(), args); System.exit(res); } }
運(yùn)行結(jié)果:
到此,關(guān)于“MR程序的組件combiner怎么使用”的學(xué)習(xí)就結(jié)束了,希望能夠解決大家的疑惑。理論與實(shí)踐的搭配能更好的幫助大家學(xué)習(xí),快去試試吧!若想繼續(xù)學(xué)習(xí)更多相關(guān)知識(shí),請(qǐng)繼續(xù)關(guān)注億速云網(wǎng)站,小編會(huì)繼續(xù)努力為大家?guī)砀鄬?shí)用的文章!
免責(zé)聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng),如果涉及侵權(quán)請(qǐng)聯(lián)系站長(zhǎng)郵箱:is@yisu.com進(jìn)行舉報(bào),并提供相關(guān)證據(jù),一經(jīng)查實(shí),將立刻刪除涉嫌侵權(quán)內(nèi)容。