Details
-
Type: Bug
-
Status: Resolved
-
Priority: Blocker
-
Resolution: Fixed
-
Affects Version/s: CDH 5.1.0, CDH 5.1.2
-
Fix Version/s: None
-
Component/s: Hive
-
Labels:None
Description
This is a nasty surprise for anyone who uses Hive tables with Avro and has in the past added a default null to a primitive value, thereby evolving the schema from a primitive type to a union. All your mappers will fail with a "Not a union: String"error.
See my comment on https://issues.apache.org/jira/browse/HIVE-5823?focusedCommentId=14142801&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14142801
TL;DR - either the HIVE-6806 patch needs to be added as well (although it might have other problems lurking), or a manual fix to AvroDeserializer.java needs to be applied, something that will make the deserializeNullableUnion() method look like in the latest trunk:
private Object deserializeNullableUnion(Object datum, Schema fileSchema, Schema recordSchema, TypeInfo columnType) throws AvroSerdeException { int tag = GenericData.get().resolveUnion(recordSchema, datum); // Determine index of value Schema schema = recordSchema.getTypes().get(tag); if (schema.getType().equals(Schema.Type.NULL)) { return null; } Schema currentFileSchema = null; if (fileSchema != null) { currentFileSchema = fileSchema.getType() == Type.UNION ? fileSchema.getTypes().get(tag) : fileSchema; } return worker(datum, currentFileSchema, schema, SchemaToTypeInfo.generateTypeInfo(schema)); }