Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 0.17.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Currently, csv-import basically assumes that all fields will be present and in the same order as the dataset is declared. It checks the headers to make sure they match with the dataset, but if a field is missing or out of order the whole process breaks.

      For example, create:
      test2.csv:

      Id,Value2
      1,value!
      

      test2.avsc:

      {
        "type" : "record",
        "name" : "Test",
        "namespace" : "com.cloudera",
        "doc" : "Schema generated by Kite",
        "fields" : [ {
          "name" : "Id",
          "type" : "long"
        }, {
          "name" : "Value",
          "type" : [ "null", "long" ],
          "default": null
        }, {
          "name": "Value2",
          "type": ["null", "string" ],
          "default": null
       }
       ]
      }
      

      Then..

      $ ./kite-dataset create test_incomplete_csv -s test2.avsc 
      $ ./kite-dataset csv-import test2.csv test_incomplete_csv
      Argument error: Incompatible schema field order
      [prints schemas]
      

      It should be able to figure out that the second column corresponds to Value2 if the header matches the dataset definition.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                blue Ryan Blue
                Reporter:
                alanj Alan Jackoway
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: