您好,登錄后才能下訂單哦!
這篇文章主要介紹“mapreduce wordcount怎么理解”,在日常操作中,相信很多人在mapreduce wordcount怎么理解問(wèn)題上存在疑惑,小編查閱了各式資料,整理出簡(jiǎn)單好用的操作方法,希望對(duì)大家解答”mapreduce wordcount怎么理解”的疑惑有所幫助!接下來(lái),請(qǐng)跟著小編一起來(lái)學(xué)習(xí)吧!
wordcount統(tǒng)計(jì)個(gè)數(shù),在看代碼時(shí)總是能看懂,但是真正的邏輯反而一直不明比,比如map端時(shí)怎么處理,reduce時(shí)又是怎么處理的,現(xiàn)在明白了。
原理是這樣的,map端時(shí)讀取每一行數(shù)據(jù),并把每行數(shù)據(jù)中的一個(gè)字符統(tǒng)計(jì)一次,如下:
map 數(shù)據(jù) {key,value} :
{0,hello word by word}
{1,hello hadoop by hadoop}
上面就是map端輸入的key與value,在map端處理后會(huì)生成以下數(shù)據(jù):
{hello,1} {word,1} {by,1} {word,1}
{hello,1} {hadoop,1} {by,1} {hadoop,1}
當(dāng)看到這時(shí)大家都能明白,但是在reduce端時(shí),就怎么也看不明白了,不知道是怎么對(duì)字符做統(tǒng)一的,再下通過(guò)對(duì)hadoop原理的分析得出在到reduce端時(shí),會(huì)對(duì)map端發(fā)過(guò)來(lái)的數(shù)據(jù)進(jìn)行清洗,清洗后的數(shù)據(jù)應(yīng)該是以下結(jié)構(gòu):
[{hello},{1,1}] [{word},{1,1}] [{by},{1,1}] [{hadoop},{1,1}]
然后輸入到reduce端,reduce會(huì)對(duì)每一個(gè)values做循環(huán)操作,對(duì)數(shù)據(jù)進(jìn)行疊加,并輸出到本地,具體代碼請(qǐng)繼續(xù)欣賞,不做多過(guò)解析。
public class WordCount extends Configured implements Tool{
public static class Map extends Mapper<LongWritable,Text,Text,IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key,Text value, Context context)
throws IOException,InterruptedException{
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer();
while(tokenizer.hasMoreTokens()){
word.set(tokenizer.nextToken);
context.write(word,one);
}
}
}
public static class Reduce extends Reducer<Text,IntWritable,Text,IntWritable>{
public void reduce(Text key,Iterable<IntWritable> values,Context context)
throws IOException,InterruptedException{
int sum = 0 ;
for(IntWritable val: values) {
sum += val.get();
}
context.write(key,new IntWritable(sum));
}
}
public int run(String[] arge) throws Exception{
Job job = new Job(getConf());
job.setJarByClass(WordCount.class);
job.setJobName("wordcount");
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(Map.class);
job.setReduceClass(reduce.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextInputFormat.class);
FileInputFormat.setInputPaths(job,new Path(args[0]));
FileInputFormat.setOutputPaths(job, new Path(args[1]));
boolean success = job.waitForCompletion(true);
return success ? 0 : 1;
}
public static void main(String[] args) throws Exception{
int ret = ToolRunner.run(new WordCount(),args);
System.exit(ret);
}
}
到此,關(guān)于“mapreduce wordcount怎么理解”的學(xué)習(xí)就結(jié)束了,希望能夠解決大家的疑惑。理論與實(shí)踐的搭配能更好的幫助大家學(xué)習(xí),快去試試吧!若想繼續(xù)學(xué)習(xí)更多相關(guān)知識(shí),請(qǐng)繼續(xù)關(guān)注億速云網(wǎng)站,小編會(huì)繼續(xù)努力為大家?guī)?lái)更多實(shí)用的文章!
免責(zé)聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng),如果涉及侵權(quán)請(qǐng)聯(lián)系站長(zhǎng)郵箱:is@yisu.com進(jìn)行舉報(bào),并提供相關(guān)證據(jù),一經(jīng)查實(shí),將立刻刪除涉嫌侵權(quán)內(nèi)容。