How to submit Hadoop Map/Reduce jobs in multiple command shell to run in parallel

Sometimes it is required to run multiple Map/Reduce jobs in same Hadoop cluster however opening several Hadoop command shell or (Hadoop terminal) could be trouble. Note that depend on your Hadoop cluster size and configuration, you can run limited amount of Map/Reduce jobs in parallel however if you would need to do so, here is something you can use to accomplish your objective:

First take a look at ToolRunner method defined in Hadoop utils library as below:

Here are the quick steps:

  • Use method in loop while keeping your main Map/Reduce method inside the loop.
  • You must be using Job.submit() instead of Job.waitForCompletion() because:
    • Job.Submit() will submit all the jobs parallel
    • Job.waitForCompletion() will submit all the jobs sequentially.

Here is the code snippet:

public class LaunchParallel extends Configured implements Tool { public static void main(String args[]) { for (int i = 0; i < 50; i++) { LaunchParallel(), jobArgs); } } public int run(String args) { Job job = new Job(getConf()); // ... // Your job details here // ... job.submit(); // Must to have job.submit() to apply parallel jobs } }

Note: If you have variable arguments for each job then you can put all the arguments in an array and use the array with counter to pass the Map/Reduce job arguments.

Keyword: Hadoop, Map/Reduce, Parallel Jobs


One thought on “How to submit Hadoop Map/Reduce jobs in multiple command shell to run in parallel

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s