What was available in previous MapReduce:
- Each node in the cluster was statically assigned the capability of running a predefined number of Map slots and a predefined number of Reduce slots.
- The slots could not be shared between Maps and Reduces. This static allocation of slots wasn’t optimal since slot requirements vary during the MR job life cycle
- In general there is a demand for Map slots when the job starts, as opposed to the need for Reduce slots towards the end
Key drawback in previous MapReduce:
- In a real cluster, where jobs are randomly submitted and each has its own Map/Reduce slots requirement, having an optimal utilization of the cluster was hard, if not impossible.
What is new in MapReduce 2.0:
- The resource allocation model in Hadoop 0.23 addresses above (Key drawback) deficiency by providing a more flexible resource modeling.
- Resources are requested in the form of containers, where each container has a number of non-static attributes.
- At the time of writing this blog, the only supported attribute was memory (RAM). However, the model is generic and there is intention to add more attributes in future releases (e.g. CPU and network bandwidth).
- In this new Resource Management model, only a minimum and a maximum for each attribute are defined, and Application Master (AMs) can request containers with attribute values as multiples of these minimums.