Wednesday, November 12, 2014

The Correct Way To Write Main Method In MapReduce Project In Order To Add Runtime Arguments.

The command to run a MapReduce task from command line is as follows:
//command
hadoop  jar  *.jar  main_class  [argu..]

//example
hadoop  jar  $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar  pi  -Dmapred.job.queue.name=root.example_queue  10  10000

As we can see, '-Dmapred.job.queue.name=root.example_queue', '10' and '10000' are all arguments from the view of java class, thus they will be passed to 'args[]' argument of the main class.

But we intend to make all the '-D'-prefix argument as runtime hadoop parameters, the following way of writing MapReduce task entry is not working, because argument ''-Dmapred.job.queue.name=root.example_queue'' will be taken as args[0] in the example of above:
public  static  void  main(String[]  args)  throws  Exception  {
        JobConf  conf  =  new  JobConf(WordCount.class);
        conf.setJobName("wordcount");

        conf.setOutputKeyClass(Text.class);
        conf.setOutputValueClass(IntWritable.class);

        conf.setMapperClass(Map.class);
        conf.setCombinerClass(Reduce.class);
        conf.setReducerClass(Reduce.class);

        conf.setInputFormat(TextInputFormat.class);
        conf.setOutputFormat(TextOutputFormat.class);

        FileInputFormat.setInputPaths(conf,  new  Path(args[0]));
        FileOutputFormat.setOutputPath(conf,  new  Path(args[1]));

        JobClient.runJob(conf);
    }

The correct way is to use 'GenericOptionsParser', which will auto-load '-D'-prefix arguments into runtime MapReduce configuration and separate out all the user-defined arguments:
public  static  void  main(  String[]  args  )  throws  Exception  {
        Configuration  conf  =  new  Configuration();
        String[]  otherArgs  =  new  GenericOptionsParser(conf,  args).getRemainingArgs();
        if  (otherArgs.length  !=  3)  {
            System.err.println(  "Usage:  wordcount  <in>  <out>  <useless_interval>");
            System.exit(2);
        }
        Job  job  =  new  Job(conf,  "wordcount");
        job.setJarByClass(WordCount.class);
        job.setMapperClass(Map.class);
        job.setReducerClass(Reduce.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        FileInputFormat.addInputPath(job,  new  Path(otherArgs  [0]));
        FileOutputFormat.setOutputPath(job,  new  Path(otherArgs  [1]));
      
        System.exit(job.waitForCompletion(true)  ?  0  :  1);
}




© 2014-2017 jason4zhu.blogspot.com All Rights Reserved 
If transfering, please annotate the origin: Jason4Zhu

No comments:

Post a Comment