I am interested in the implementation of the MapReduce sort phase; it seems to be very efficient. Could someone provide some references about it please? Thanks!
1 Answer
This points to ReduceTask.java as the place where sort phase is coded. See lines 393-408 in ReduceTask.java. If you need more info, download the entire source and dig into it.
EDITED
"Sort" phase falls under ReduceTask as shown in this figure below from hadoop book. (Page no: 163)

4 Comments
Thomas Jungblut
Attention: The sort happens actually after the mapper. The reducer just merges the sorted segments.
Tejas Patil
@ThomasJungblut: see the figure above.
Tejas Patil
@ThomasJungblut sorting happens over the output of map() method and the inputs that a reducer gets from several mappers. The later is termed as the "Sort Phase" (as shown in the pic).
Thomas Jungblut
the sorting is parallel to the output to map, still the sort phase is just a merge phase. So the naming is misleading. That is what my intention was (a few months ago).