10

My problem is sorting values in a file. keys and values are integers and need to maintain the keys of sorted values.

key   value
1     24
3     4
4     12
5     23

output:

1     24
5     23
4     12
3     4

I am working with massive data and must run the code in a cluster of hadoop machines. How can i do it with mapreduce?

4
  • So, what do you want to sort by? key or value? Can you provide an example showing the file and how it should be sorted? Commented Aug 9, 2013 at 20:32
  • @JtheRocker i edited. Commented Aug 9, 2013 at 21:03
  • So, your keys are unique? Commented Aug 9, 2013 at 21:06
  • possible duplicate of How to sort data in map reduce hadoop? Commented Aug 11, 2013 at 14:41

1 Answer 1

19

You can probably do this (I'm assuming you are using Java here)

From maps emit like this -

context.write(24,1);
context.write(4,3);
context.write(12,4)
context.write(23,5)

So, all you values that needs to be sorted should be the key in your mapreduce job. Hadoop by default sorts by ascending order of key.

Hence, either you do this to sort in descending order,

job.setSortComparatorClass(LongWritable.DecreasingComparator.class);

Or, this,

You need to set a custom Descending Sort Comparator, which goes something like this in your job.

public static class DescendingKeyComparator extends WritableComparator {
    protected DescendingKeyComparator() {
        super(Text.class, true);
    }

    @SuppressWarnings("rawtypes")
    @Override
    public int compare(WritableComparable w1, WritableComparable w2) {
        LongWritable key1 = (LongWritable) w1;
        LongWritable key2 = (LongWritable) w2;          
        return -1 * key1.compareTo(key2);
    }
}

The suffle and sort phase in Hadoop will take care of sorting your keys in descending order 24,4,12,23

After comment:

If you require a Descending IntWritable Comparable, you can create one and use it like this -

job.setSortComparatorClass(DescendingIntComparable.class);

In case if you are using JobConf, use this to set

jobConfObject.setOutputKeyComparatorClass(DescendingIntComparable.class);

Put the following code below your main() function -

public static void main(String[] args) {
    int exitCode = ToolRunner.run(new YourDriver(), args);
    System.exit(exitCode);
}

//this class is defined outside of main not inside
public static class DescendingIntWritableComparable extends IntWritable {
    /** A decreasing Comparator optimized for IntWritable. */ 
    public static class DecreasingComparator extends Comparator {
        public int compare(WritableComparable a, WritableComparable b) {
            return -super.compare(a, b);
        }
        public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
            return -super.compare(b1, s1, l1, b2, s2, l2);
        }
    }
}
Sign up to request clarification or add additional context in comments.

11 Comments

If i have 5 computers running the code, does this code work and the final result is absoulutly true? how many reducer do i need?
Yes, you can have any number of reducrs. I'm also assuming you know how to write a MapReduce job. Please give it a shot and tell me if it solves your issue. I think it will with repect to the use case you have mentioned. Thank you.
I work with jobconf, it doesn't have setSortComparatorClass method.
my keys are intwritable.how do i use DescendingKeyComparator class in my code?
Trying to use this DescendingIntWritableComparable to implement a descending sort instead of ascending sort, but job.setSortComparatorClass() does not see DescendingIntComparable.class as a class that extends RawComparator, so it doesn't run. Any ideas how to modify this so it'll work?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.