5、Window端實現(xiàn)Mapreduce程序完成wordco

發(fā)布時間：2020-07-25 21:55:54 來源：網(wǎng)絡(luò) 閱讀：2542 作者：victor19901114 欄目：大數(shù)據(jù)

程序使用的測試文本數(shù)據(jù)：

Dear River
Dear River Bear Spark 
Car Dear Car Bear Car
Dear Car River Car 
Spark Spark Dear Spark

1編寫主要類

（1）Maper類

首先是自定義的Maper類代碼

public class WordCountMap extends Mapper<LongWritable, Text, Text, IntWritable> {
    public void map(LongWritable key, Text value, Context context)
            throws IOException, InterruptedException {
        //fields:代表著文本一行的的數(shù)據(jù): dear bear river
        String[] words = value.toString().split("\t");
        for (String word : words) {
            // 每個單詞出現(xiàn)１次，作為中間結(jié)果輸出
            context.write(new Text(word), new IntWritable(1));
        }
    }
}

?????這個Map類是一個泛型類型，它有四個形參類型，分別指定map()函數(shù)的輸入鍵、輸入值、輸出鍵和輸出值的類型。LongWritable：輸入鍵類型，Text：輸入值類型，Text:輸出鍵類型，IntWritable：輸出值類型.
?????String[] words = value.toString().split("\t");,words 的值為Dear River Bear River
?????輸入鍵key是一個長整數(shù)偏移量，用來尋找第一行的數(shù)據(jù)和下一行的數(shù)據(jù)，輸入值是一行文本Dear River Bear River，輸出鍵是單詞Bear ，輸出值是整數(shù)1。
?????Hadoop本身提供了一套可優(yōu)化網(wǎng)絡(luò)序列化傳輸?shù)幕绢愋?，而不直接使用Java內(nèi)嵌的類型。這些類型都在org.apache.hadoop.io包中。這里使用LongWritable類型(相當(dāng)于Java的Long類型)、Text類型(相當(dāng)于Java中的String類型)和IntWritable類型(相當(dāng)于Java的Integer類型)。
?????map()方法的參數(shù)是輸入鍵和輸入值。以本程序為例，輸入鍵LongWritable key是一個偏移量，輸入值Text value是Dear Car Bear Car ，我們首先將包含有一行輸入的Text值轉(zhuǎn)換成Java的String類型，之后使用substring()方法提取我們感興趣的列。map()方法還提供了Context實例用于輸出內(nèi)容的寫入。

（2）Reducer類

public class WordCountReduce extends Reducer<Text, IntWritable, Text, IntWritable> {
    /*
        (River, 1)
        (River, 1)
        (River, 1)
        (Spark , 1)
        (Spark , 1)
        (Spark , 1)
        (Spark , 1)

        key: River
        value: List(1, 1, 1)
        key: Spark
        value: List(1, 1, 1,1)

    */
    public void reduce(Text key, Iterable<IntWritable> values,
                          Context context) throws IOException, InterruptedException {
        int sum = 0;
        for (IntWritable count : values) {
            sum += count.get();
        }
        context.write(key, new IntWritable(sum));// 輸出最終結(jié)果
    };
}

Reduce任務(wù)最初按照分區(qū)號從Map端抓取數(shù)據(jù)為：
(River, 1)
(River, 1)
(River, 1)
(spark, 1)
(Spark , 1)
(Spark , 1)
(Spark , 1)
經(jīng)過處理后得到的結(jié)果為：
key: hello value: List(1, 1, 1)
key: spark value: List(1, 1, 1,1)
所以reduce()函數(shù)的形參 Iterable<IntWritable> values 接收到的值為List(1, 1, 1)和List(1, 1, 1,1)

（3）Main函數(shù)

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import java.io.IOException;

public class WordCountMain {
    //若在IDEA中本地執(zhí)行MR程序，需要將mapred-site.xml中的mapreduce.framework.name值修改成local
    public static void main(String[] args) throws IOException,
            ClassNotFoundException, InterruptedException {
        if (args.length != 2 || args == null) {
            System.out.println("please input Path!");
            System.exit(0);
        }
        //System.setProperty("HADOOP_USER_NAME","hadoop2.7");
        Configuration configuration = new Configuration();
        //configuration.set("mapreduce.job.jar","/home/bruce/project/kkbhdp01/target/com.kaikeba.hadoop-1.0-SNAPSHOT.jar");
        //調(diào)用getInstance方法，生成job實例
        Job job = Job.getInstance(configuration, WordCountMain.class.getSimpleName());
        // 打jar包
        job.setJarByClass(WordCountMain.class);

        // 通過job設(shè)置輸入/輸出格式
        // MR的默認(rèn)輸入格式是TextInputFormat，所以下兩行可以注釋掉
        // job.setInputFormatClass(TextInputFormat.class);
        // job.setOutputFormatClass(TextOutputFormat.class);
        // 設(shè)置輸入/輸出路徑
        FileInputFormat.setInputPaths(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        // 設(shè)置處理Map/Reduce階段的類
        job.setMapperClass(WordCountMap.class);
        //map combine減少網(wǎng)路傳出量
        job.setCombinerClass(WordCountReduce.class);
        job.setReducerClass(WordCountReduce.class);

        //如果map、reduce的輸出的kv對類型一致，直接設(shè)置reduce的輸出的kv對就行；如果不一樣，需要分別設(shè)置map, reduce的        輸出的kv類型
        //job.setMapOutputKeyClass(.class)
        // job.setMapOutputKeyClass(Text.class);
        // job.setMapOutputValueClass(IntWritable.class);

        // 設(shè)置reduce task最終輸出key/value的類型
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

        // 提交作業(yè)
        job.waitForCompletion(true);

    }
}

2本地運行

首先更改mapred-site.xml文件配置
將mapreduce.framework.name的值設(shè)置為local
5、Window端實現(xiàn)Mapreduce程序完成wordco
然后本地運行：

查看結(jié)果：

3集群運行

方式一：

首先打包
5、Window端實現(xiàn)Mapreduce程序完成wordco
更改配置文件，改成yarn模式

添加本地jar包位置：

 Configuration configuration = new Configuration();
 configuration.set("mapreduce.job.jar","C:\\Users\\tanglei1\\IdeaProjects\\Hadooptang\\target");

5、Window端實現(xiàn)Mapreduce程序完成wordco
設(shè)置允許跨平臺遠(yuǎn)程調(diào)用：

configuration.set("mapreduce.app-submission.cross-platform","true");

5、Window端實現(xiàn)Mapreduce程序完成wordco
修改輸入?yún)?shù)：

運行結(jié)果：

方式二：

將maven項目打包，在服務(wù)器端用命令運行mr程序

hadoop jar com.kaikeba.hadoop-1.0-SNAPSHOT.jar
com.kaikeba.hadoop.wordcount.WordCountMain /tttt.txt  /wordcount11

向AI問一下細(xì)節(jié)

5、Window端實現(xiàn)Mapreduce程序完成wordco

1編寫主要類

（1）Maper類

（2）Reducer類

（3）Main函數(shù)

2本地運行

3集群運行

方式一：

方式二：

猜你喜歡

最新資訊

相關(guān)推薦

相關(guān)標(biāo)簽

5、Window端實現(xiàn)Mapreduce程序完成wordco