2018_EJRNL_PP_NENAVATH_SRINIVAS_NAIK_1.pdf
Terbatas Lili Sawaludin Mulyadi
» ITB
Terbatas Lili Sawaludin Mulyadi
» ITB
MapReduce is an essential framework for distributed storage and parallel processing for large-scale dataintensive jobs proposed in recent times. Hadoop default scheduler assumes homogeneous environment.
This assumption of homogeneity does not work at all times in practice and limits the performance of
MapReduce. Data locality is essentially moving computation closer (faster access) to the input data.
Fundamentally, MapReduce does not always look into the heterogeneity from a data locality perspective.
Improving data locality for MapReduce framework is an important issue to improve the performance of
large-scale Hadoop clusters.
This paper proposes a novel data locality based scheduler which allocates input data blocks to the
nodes based on their processing capacity. Also schedules map andreduce tasks to the nodes based on their
computing ability in the heterogeneous Hadoop cluster. We evaluate proposed scheduler using different
workloads from Hi-Bench benchmark suite. The experimental results prove that our proposed scheduler
enhances the MapReduce performance in heterogeneous environments. Minimizes job execution time,
and also improves data locality for different parameters as compared to the Hadoop default scheduler,
Matchmaking scheduler and Delay scheduler respectively.