lundi 31 octobre 2016

How to work with the group comparator in mapreduce?

I've read these articles

What is the use of grouping comparator in hadoop map reduce

What is difference between Sort Comparator and Group Comparator in Hadoop?

so I somewhat understand how it works.

The problem is... When testing, without this line

driver.setKeyGroupingComparator(groupComparator);

I get the folowing output in the reducer.

0000000000_44137
902996760100000_44137
9029967602_44137
90299676030000_44137
9029967604_44137
905000_38704
9050000001_38702
9050000001_38704
9050000001_38705
9050000001_38706
9050000001_38714
9050000002_38704
9050000002_38706
9050000011_38704
9050000011_38706
9050000021_38702
9050000031_38704
9050000031_38705
9050000031_38714

With it, I get

0000000000_44137
902996760100000_44137
9029967602_44137
90299676030000_44137
9029967604_44137
905000_38704
9050000001_38702
9050000002_38704
9050000011_38704
9050000021_38702
9050000031_38704

The reducer)

@Override
    public void reduce(CompositeKey key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
        System.out.println(key.getFirst() + "_" + key.getSecond());
    }

I understand it should be something along the lines of

0000000000 44137 value1
9050000001 38702 value1 38704 value2 38705 value3 38706 value4 38714 value5

but is

0000000000 44137 value1
9050000001 38702 value1 value2 value3 value4 value5

Where all right halves exept first one are lost. How do I iterate over every right part of the key? Is it possible?

My GroupComparator

public class GroupComparator extends WritableComparator {
    public GroupComparator() {
        super(CompositeKey.class, true);
    }
    @Override
    public int compare(WritableComparable a,
                       WritableComparable b) {
        CompositeKey lhs = (CompositeKey)a;
        CompositeKey rhs = (CompositeKey)b;
        return lhs.getFirst().compareTo(rhs.getFirst());
    }
}

Aucun commentaire:

Enregistrer un commentaire