Details
-
Type: Bug
-
Status: Open
-
Priority: Major
-
Resolution: Unresolved
-
Affects Version/s: 1.1.0
-
Fix Version/s: 1.3.0
-
Component/s: Command-line Interface, Data Module
-
Labels:None
Description
When keys and values are serialized with a schema embedded in Avro's Pair schema (org.apache.avro.mapred.Pair), the pair's namespace overrides any blank namespaces and prevents unions from resolving. The work-around is to set the namespace explicitly. That works for keys, but probably not for values. This might take an Avro fix.
org.apache.avro.UnresolvedUnionException: Not in union [{"type":"record","name":"CustomerProcessKeySchema","namespace":"crunch","fields":[{"name":"customer","type":"string"},{"name":"process","type":"string"}]},"null"]: {"customer": "A", "process": "12345"} at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:561) at org.apache.avro.generic.GenericData.hashCode(GenericData.java:738) at org.apache.avro.generic.GenericData.hashCodeAdd(GenericData.java:752) at org.apache.avro.generic.GenericData.hashCode(GenericData.java:727) at org.apache.avro.generic.GenericData$Record.hashCode(GenericData.java:122) at org.apache.avro.mapred.AvroWrapper.hashCode(AvroWrapper.java:38) at org.apache.hadoop.mapreduce.lib.partition.HashPartitioner.getPartition(HashPartitioner.java:29) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:601) at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:106) at org.apache.crunch.impl.mr.emit.OutputEmitter.emit(OutputEmitter.java:41) at org.apache.crunch.MapFn.process(MapFn.java:34)