Uploaded image for project: 'Kite SDK (READ-ONLY)'
  1. Kite SDK (READ-ONLY)
  2. KITE-874

Kite CLI csv-import HDFS temp file path not multiuser safe

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.17.1
    • Fix Version/s: 0.18.0
    • Component/s: Command-line Interface
    • Labels:
      None

      Description

      I'm running the Kite CLI version 0.17.1 using csv-import for CSV to Avro conversion. I've run into a couple transient failures that appear to be HDFS permission issues when other developers and I on the same cluster are running import jobs entirely within our own workspaces (source and target under hdfs:///user/username, for example). The reported errors are like "CopyTask:Kite(dataset:hdfs://nn1/tmp/account/.temp/...)", with the UUID appended, but the actual failure is that I can't write to /tmp/account if another user has created that directory by running a Kite job.

      The bug / fix appears to be the TemporaryFileSystemDatasetRepository setup in CSVImportCommand#run - it should create a per-user directory under /tmp.

        Attachments

          Activity

            People

            • Assignee:
              tom Tom White
              Reporter:
              mzkd Mark Kidwell
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: