Uploaded image for project: 'Kite SDK (READ-ONLY)'
  1. Kite SDK (READ-ONLY)
  2. KITE-944

Reading parquet datasets with specific records fails after schema evolution

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.0.0
    • Fix Version/s: 1.1.0
    • Component/s: None
    • Labels:
      None

      Description

      If you write records using a specific schema and then you later evolve that schema, then you get errors when you read from a Parquet formatted dataset. The problem is the Parquet internally instantiates objects based on the namespace and name in the stored avro schema. When you evolve your schema and compile new specific classes, those objects are not compatible with the old schema if you're using the IndexedRecord interface, which Parquet does.

      Currently, I think the only way you can safely evolve the schema of a Parquet dataset is if you're adding fields to the end of the schema,

        Attachments

          Activity

            People

            • Assignee:
              joey Joey Echeverria
              Reporter:
              joey Joey Echeverria
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: