Uploaded image for project: 'Kite SDK (READ-ONLY)'
  1. Kite SDK (READ-ONLY)
  2. KITE-1116

CSV import should allow for defining NULL values

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.1.0
    • Fix Version/s: None
    • Component/s: Command-line Interface
    • Labels:
      None
    • Environment:
      Any.

      Description

      CSV files tend to have empty strings like so:

      val1,,val3

      Where the two commas would make an empty string when landing in AVRO. Of course, two commas next to each other can be interpreted as null or empty string. Even moreso if empty string is typically provided as ,"", where ,, could then definitely be just considered NULL.

      Kite import CLI of CSV files should allow for the user to define what null means for the csv files. Many times, empty string should be assumed to be treated as null. But in other circumstances, perhaps the providers of the CSV file can agree to have \N represent null fields to differentiate between empty strings and nulls.

      Especially when Hive allows for TBLPROPERTIES to determine what makes a null field, I think its important when importing to Avro/Parquet formats specifically, that on import you can pre-define what it means to be null (since those contain nullable field values..)

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              mladkov Mladen Kovacevic
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: