How to submit Hadoop Map/Reduce jobs in multiple command shell to run in parallel
Sometimes it is required to run multiple Map/Reduce jobs in same Hadoop cluster however opening several Hadoop command shell or (Hadoop terminal) could be trouble. Note that depend on your Hadoop cluster size and configuration, you can run limited amount of Map/Reduce jobs in parallel however if you would need to do so, here is something you can use to accomplish your objective:
First take a look at ToolRunner method defined in Hadoop utils library as below:
https://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/util/ToolRunner.html
Here are the quick steps:
- Use ToolRunner.run(..) method in loop while keeping your main Map/Reduce method inside the loop.
- You must be using Job.submit() instead of Job.waitForCompletion() because:
- Job.Submit() will submit all the jobs parallel
- Job.waitForCompletion() will submit all the jobs sequentially.
Here is the code snippet:
public class LaunchParallel extends Configured implements Tool {
public static void main(String args[]) {
for (int i = 0; i < 50; i++) {
ToolRunner.run(new LaunchParallel(), jobArgs);
}
}
public int run(String args) {
Job job = new Job(getConf());
// ...
// Your job details here
// ...
job.submit(); // Must to have job.submit() to apply parallel jobs
}
}
Note: If you have variable arguments for each job then you can put all the arguments in an array and use the array with counter to pass the Map/Reduce job arguments.
Keyword: Hadoop, Map/Reduce, Parallel Jobs