[KITE-944] Reading parquet datasets with specific records fails after schema evolution - Cloudera Open Source

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.0.0
Fix Version/s: 1.1.0
Component/s: None
Labels:
None

Description

If you write records using a specific schema and then you later evolve that schema, then you get errors when you read from a Parquet formatted dataset. The problem is the Parquet internally instantiates objects based on the namespace and name in the stored avro schema. When you evolve your schema and compile new specific classes, those objects are not compatible with the old schema if you're using the IndexedRecord interface, which Parquet does.

Currently, I think the only way you can safely evolve the schema of a Parquet dataset is if you're adding fields to the end of the schema,

Attachments

Activity

People

Assignee:

Joey Echeverria

Reporter:

Joey Echeverria

Votes:

0 Vote for this issue

Watchers:

1 Start watching this issue

Dates

Created:

27/Feb/15 10:11 PM

Updated:

05/Mar/15 7:24 PM

Resolved:

05/Mar/15 7:24 PM