I've read these articles
What is the use of grouping comparator in hadoop map reduce
What is difference between Sort Comparator and Group Comparator in Hadoop?
so I somewhat understand how it works.
The problem is... When testing, without this line
driver.setKeyGroupingComparator(groupComparator);
I get the folowing output in the reducer.
0000000000_44137
902996760100000_44137
9029967602_44137
90299676030000_44137
9029967604_44137
905000_38704
9050000001_38702
9050000001_38704
9050000001_38705
9050000001_38706
9050000001_38714
9050000002_38704
9050000002_38706
9050000011_38704
9050000011_38706
9050000021_38702
9050000031_38704
9050000031_38705
9050000031_38714
With it, I get
0000000000_44137
902996760100000_44137
9029967602_44137
90299676030000_44137
9029967604_44137
905000_38704
9050000001_38702
9050000002_38704
9050000011_38704
9050000021_38702
9050000031_38704
The reducer)
@Override
public void reduce(CompositeKey key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
System.out.println(key.getFirst() + "_" + key.getSecond());
}
I understand it should be something along the lines of
0000000000 44137 value1
9050000001 38702 value1 38704 value2 38705 value3 38706 value4 38714 value5
but is
0000000000 44137 value1
9050000001 38702 value1 value2 value3 value4 value5
Where all right halves exept first one are lost. How do I iterate over every right part of the key? Is it possible?
My GroupComparator
public class GroupComparator extends WritableComparator {
public GroupComparator() {
super(CompositeKey.class, true);
}
@Override
public int compare(WritableComparable a,
WritableComparable b) {
CompositeKey lhs = (CompositeKey)a;
CompositeKey rhs = (CompositeKey)b;
return lhs.getFirst().compareTo(rhs.getFirst());
}
}
Aucun commentaire:
Enregistrer un commentaire