溫馨提示×

溫馨提示×

您好,登錄后才能下訂單哦!

密碼登錄×
登錄注冊×
其他方式登錄
點擊 登錄注冊 即表示同意《億速云用戶服務(wù)條款》

hadoop重寫方法有哪些

發(fā)布時間:2021-12-23 13:51:34 來源:億速云 閱讀:132 作者:iii 欄目:云計算

這篇文章主要介紹“hadoop重寫方法有哪些”,在日常操作中,相信很多人在hadoop重寫方法有哪些問題上存在疑惑,小編查閱了各式資料,整理出簡單好用的操作方法,希望對大家解答”hadoop重寫方法有哪些”的疑惑有所幫助!接下來,請跟著小編一起來學習吧!

1.   下載(略)

2.   編譯(略)

3.   配置(偽分布、集群略)

4.   Hdfs

1.   Web interface:http://namenode-name:50070/(顯示datanode列表和集群統(tǒng)計信息)

2.   shell command & dfsadmin comman

3.   checkpoint node & backup node

1.   fsimage和edits文件merge原理

2.   (猜測是早期版本的特性)手動恢復宕掉的集群:import checkpoint;

3.   backupnode: Backup Node在內(nèi)存中維護了一份從Namenode同步過來的fsimage,同時它還從namenode接收edits文件的日志流,并把它們持久化硬盤,Backup Node把收到的這些edits文件和內(nèi)存中的fsimage文件進行合并,創(chuàng)建一份元數(shù)據(jù)備份。Backup Node高效的秘密就在這兒,它不需要從Namenode下載fsimage和edit,把內(nèi)存中的元數(shù)據(jù)持久化到磁盤然后進行合并即可。

4.   banlancer:平衡各rock和datanodes數(shù)據(jù)不均衡

5.   Rock awareness:機架感知

6.   Safemode:當數(shù)據(jù)文件不完整或者手動進入safemode時,hdfs只讀,當集群檢查達到閾值或手動離開安全模式時,集群恢復讀寫。

7.   Fsck:塊文件檢查命令

8.   Fetchdt:獲取token(安全)

9.   Recovery mode:恢復模式

10. Upgrade and Rollback:升級、回滾

11. File Permissions and Security

12. Scalability

13.  

5.   Mapreduce

1.    

public class MyMapper extends Mapper<Object, Text, Text, IntWritable>{

   private Text word = new Text();

   private IntWritable one = new IntWritable(1);

   // 重寫map方法

   @Override

   public void map(Object key, Text value, Context context)

        throws IOException, InterruptedException {

      StringTokenizer stringTokenizer = new StringTokenizer(value.toString());

      while(stringTokenizer.hasMoreTokens()){

        word.set(stringTokenizer.nextToken());

        // (word,1)進行傳遞

        context.write(word, one);

      }

   }

}

public class MyReducer extends Reducer<Text, IntWritable, Text, IntWritable>{

   private IntWritable result = new IntWritable(0);

   // 重寫reduce方法

   @Override

   protected void reduce(Text key, Iterable<IntWritable> iterator,

        Context context) throws IOException, InterruptedException {

      int sum = 0;

      for(IntWritable i : iterator){

        sum += i.get();

      }

      result.set(sum);

      // reduce輸出的值

      context.write(key, result);

   }

}

public class WordCountDemo {

   public static void main(String[] args) throws Exception {

      Configuration conf = new Configuration();

      Job job = Job.getInstance(conf, "word count");

      job.setJarByClass(WordCountDemo.class);

      // 設(shè)置map、reduce class

      job.setMapperClass(MyMapper.class);

      job.setReducerClass(MyReducer.class);

      job.setCombinerClass(MyReducer.class);

      // 設(shè)置最終輸出的格式

      job.setOutputKeyClass(Text.class);

      job.setOutputValueClass(IntWritable.class);

      // 設(shè)置FileInputFormat outputFormat

      FileInputFormat.addInputPath(job, new Path(args[0]));

      FileOutputFormat.setOutputPath(job, new Path(args[1]));

      System.exit(job.waitForCompletion(true) ? 0 : 1);

   }

}

2. Job.setGroupingComparatorClass(Class).

3.  Job.setCombinerClass(Class),

4. CompressionCodec

5. Map數(shù):Configuration.set(MRJobConfig.NUM_MAPS, int) => dataSize/blockSize

6. Reducer數(shù):Job.setNumReduceTasks(int).

With 0.95 all of the reduces can launch immediately and start transferring map outputs as the maps finish. With 1.75 the faster nodes will finish their first round of reduces and launch a second wave of reduces doing a much better job of load balancing.

7. Reduce->shuffle: Input to the Reducer is the sorted output of the mappers. In this phase the framework fetches the relevant partition of the output of all the mappers, via HTTP. –> reduce是mapper排序后的輸出的結(jié)果。在這一階段,框架通過http抓取所有mapper輸出的有關(guān)分區(qū)。

8. Reduce ->sort:The framework groups Reducer inputs by keys (since different mappers may have output the same key) in this stage. The shuffle and sort phases occur simultaneously; while map-outputs are being fetched they are merged.-> 在這一階段,框架按照輸入的key(不同的mapper可能輸出相同的key)分組reducer。Shuffle和sort會同時發(fā)生,當map輸出被捕捉時,他們又會進行合并。

9. Reduce ->reduce:

10.  Secondary sort

11.  Partitioner

12.  Counter :Mapper and Reducer implementations can use the Counter to report statistics.

13.  Job conf:配置 -> speculative manner ( setMapSpeculativeExecution(boolean))/setReduceSpeculativeExecution(boolean)), maximum number of attempts per task (setMaxMapAttempts(int)/ setMaxReduceAttempts(int)) etc.  

Or

 Configuration.set(String, String)/ Configuration.get(String)

14.  Task executor & environment ->  The user can specify additional options to the child-jvm via the mapreduce.{map|reduce}.java.opts and configuration parameter in the Job such as non-standard paths for the run-time linker to search shared libraries via -Djava.library.path=<> etc. If the mapreduce.{map|reduce}.java.opts parameters contains the symbol @taskid@ it is interpolated with value of taskid of the MapReduce task.

15.  Memory management - > Users/admins can also specify the maximum virtual memory of the launched child-task, and any sub-process it launches recursively, using mapreduce.{map|reduce}.memory.mb. Note that the value set here is a per process limit. The value for mapreduce.{map|reduce}.memory.mb should be specified in mega bytes (MB). And also the value must be greater than or equal to the -Xmx passed to JavaVM, else the VM might not start.

16.  Map Parameters ...... (http://hadoop.apache.org/docs/r2.6.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html#MapReduce_Tutorial)

17.  Parameters ()

18.  Job submission and monitoring:

1.Job provides facilities to submit jobs, track their progress, access component-tasks' reports and logs, get the MapReduce cluster's status information and so on.

2. The job submission process involves:

1. Checking the input and output specifications of the job.

2. Computing the InputSplit values for the job.

3. Setting up the requisite accounting information for the DistributedCache of the job, if necessary.

4. Copying the job's jar and configuration to the MapReduce system directory on the FileSystem.

5. Submitting the job to the ResourceManager and optionally monitoring it's status.

3. Job history

19.  Job controller

1. Job.submit() || Job.waitForCompletion(boolean)

2. 多Mapreduce job

1. 迭代式mapreduce(上一個mr作為下一個mr的輸入,缺點:創(chuàng)建job對象的開銷、本地磁盤讀寫io和網(wǎng)絡(luò)開銷大)

2. MapReduce-JobControl:job封裝各個job的依賴關(guān)系,jobcontrol線程管理各個作業(yè)的狀態(tài)。

3. MapReduce-ChainMapper/ChainReduce:(chainMapper.addMap().可以在一個job中鏈接多個mapper任務(wù),不可用于多reduce的job)。

20.  Job input & output

1. InputFormat TextInputFormat FileInputFormat

2. InputSplit FileSplit

3. RecordReader

4. OutputFormat OutputCommitter

到此,關(guān)于“hadoop重寫方法有哪些”的學習就結(jié)束了,希望能夠解決大家的疑惑。理論與實踐的搭配能更好的幫助大家學習,快去試試吧!若想繼續(xù)學習更多相關(guān)知識,請繼續(xù)關(guān)注億速云網(wǎng)站,小編會繼續(xù)努力為大家?guī)砀鄬嵱玫奈恼拢?/p>

向AI問一下細節(jié)

免責聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點不代表本網(wǎng)站立場,如果涉及侵權(quán)請聯(lián)系站長郵箱:is@yisu.com進行舉報,并提供相關(guān)證據(jù),一經(jīng)查實,將立刻刪除涉嫌侵權(quán)內(nèi)容。

AI