3

I am interested in the implementation of the MapReduce sort phase; it seems to be very efficient. Could someone provide some references about it please? Thanks!

1 Answer 1

7

This points to ReduceTask.java as the place where sort phase is coded. See lines 393-408 in ReduceTask.java. If you need more info, download the entire source and dig into it.

EDITED

"Sort" phase falls under ReduceTask as shown in this figure below from hadoop book. (Page no: 163) enter image description here

Sign up to request clarification or add additional context in comments.

4 Comments

Attention: The sort happens actually after the mapper. The reducer just merges the sorted segments.
@ThomasJungblut: see the figure above.
@ThomasJungblut sorting happens over the output of map() method and the inputs that a reducer gets from several mappers. The later is termed as the "Sort Phase" (as shown in the pic).
the sorting is parallel to the output to map, still the sort phase is just a merge phase. So the naming is misleading. That is what my intention was (a few months ago).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.