Programmatically setting number of reducers with MapReduce job in Hadoop Cluster

When submitting a Map/Reduce job in Hadoop cluster, you can provide number of map task for the jobs and the number of reducers are created depend on the Mappers input and the Hadoop cluster capacity. Or you can push the job and Map/Reduce framework will adjust it per cluster configuration. So setting the total number of reducer somehow is not required and not a situation to worry upon. However if you hard code to number of reducers in Map/Reduce framework then it does not matter how many nodes are in your cluster the reducers will be used as per your configuration.

If you would want to have fixed number of reducer at run time, you can do it while passing the Map/Reduce job at command line. Using “-D mapred.reduce.tasks” with desired number will spawn that many reducers at runtime.

Modifying programmatically then number of reducers is important when someone is using partitioner in Map/reduce implementation. And because of having partitioner it becomes very important to ensure that the number of reducers are at least equal to the total number of possible partitions because total number of partitions are mostly the total number of nodes in the Hadoop cluster.

When you need to get the cluster details at run time programmatically, you can use ClusterStatus API to get those details.

Comments

  • Anonymous
    May 04, 2014
    Please let me know how to decide , number of reducers , like if I have 100 billion records how to identify how many reducers will be there.