Uploaded image for project: 'Kite SDK (READ-ONLY)'
  1. Kite SDK (READ-ONLY)
  2. KITE-1018

Avoid unnecessary copying in DatasetKeyOutputFormat

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.0.0
    • Fix Version/s: 1.1.0
    • Component/s: Data Module
    • Labels:
      None

      Description

      DatasetKeyOutputFormat currently copies any records that aren't reflect records. No one seems to know why, but it was added along with the data model changes. There are a couple of possible reasons:

      • Through 1.0.0, the CLI buffered records in memory before writing
      • Might have been an attempt to make records match the outgoing schema

      Buffering records in memory no longer happens, so the CLI is safe for formats that reuse objects when reading from local files.

      The Schema management has also been fixed in 1.1.0 and everything should be correctly writing with the output dataset's schema. Kite should simply assume that records have the correct schema, or should verify the schema directly instead of penalizing all writes.

      I'm going to add a property, kite.copyOutputRecords, to turn on this behavior and change the behavior to not copy records by default. This should be a safe change given the other updates.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                blue Ryan Blue
                Reporter:
                blue Ryan Blue
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: