digilib@itb.ac.id +62 812 2508 8800

2018_EJRNL_PP_NENAVATH_SRINIVAS_NAIK_1.pdf
Terbatas Lili Sawaludin Mulyadi
» ITB

MapReduce is an essential framework for distributed storage and parallel processing for large-scale dataintensive jobs proposed in recent times. Hadoop default scheduler assumes homogeneous environment. This assumption of homogeneity does not work at all times in practice and limits the performance of MapReduce. Data locality is essentially moving computation closer (faster access) to the input data. Fundamentally, MapReduce does not always look into the heterogeneity from a data locality perspective. Improving data locality for MapReduce framework is an important issue to improve the performance of large-scale Hadoop clusters. This paper proposes a novel data locality based scheduler which allocates input data blocks to the nodes based on their processing capacity. Also schedules map andreduce tasks to the nodes based on their computing ability in the heterogeneous Hadoop cluster. We evaluate proposed scheduler using different workloads from Hi-Bench benchmark suite. The experimental results prove that our proposed scheduler enhances the MapReduce performance in heterogeneous environments. Minimizes job execution time, and also improves data locality for different parameters as compared to the Hadoop default scheduler, Matchmaking scheduler and Delay scheduler respectively.