Details

      Description

      When keys and values are serialized with a schema embedded in Avro's Pair schema (org.apache.avro.mapred.Pair), the pair's namespace overrides any blank namespaces and prevents unions from resolving. The work-around is to set the namespace explicitly. That works for keys, but probably not for values. This might take an Avro fix.

      org.apache.avro.UnresolvedUnionException: Not in union [{"type":"record","name":"CustomerProcessKeySchema","namespace":"crunch","fields":[{"name":"customer","type":"string"},{"name":"process","type":"string"}]},"null"]: {"customer": "A", "process": "12345"}
      	at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:561)
      	at org.apache.avro.generic.GenericData.hashCode(GenericData.java:738)
      	at org.apache.avro.generic.GenericData.hashCodeAdd(GenericData.java:752)
      	at org.apache.avro.generic.GenericData.hashCode(GenericData.java:727)
      	at org.apache.avro.generic.GenericData$Record.hashCode(GenericData.java:122)
      	at org.apache.avro.mapred.AvroWrapper.hashCode(AvroWrapper.java:38)
      	at org.apache.hadoop.mapreduce.lib.partition.HashPartitioner.getPartition(HashPartitioner.java:29)
      	at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:601)
      	at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)
      	at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:106)
      	at org.apache.crunch.impl.mr.emit.OutputEmitter.emit(OutputEmitter.java:41)
      	at org.apache.crunch.MapFn.process(MapFn.java:34)
      

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              blue Ryan Blue
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: