Uploaded image for project: 'Kite SDK (READ-ONLY)'
  1. Kite SDK (READ-ONLY)
  2. KITE-1013

Configurable Sample-Based Type Inferencing for JSON and CSV

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 0.17.1
    • Fix Version/s: None
    • Component/s: Command-line Interface
    • Labels:
      None

      Description

      JSON types are inferred by sampling several of the JSON objects, and then merging the value types into the most general one. An analogous capability based on row sampling is not currently implemented for CSV files. csv-schema just finds the first non-empty values for each column and uses those, which can lead to schema inaccuracies when trying to grok an entire file.

      Also, for JSON, the sampling number is hardset. For both JSON and CSV files, we could provide a command-line flag to allow the user to specify the number of objects/rows to sample on when inferring types.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                aeskilson Aleksander Eskilson
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated: