Uploaded image for project: 'Kite SDK (READ-ONLY)'
  1. Kite SDK (READ-ONLY)
  2. KITE-1073

Copy or import job submission fails due to wrong default FS

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.1.0
    • Fix Version/s: 1.2.0
    • Component/s: Data Module
    • Labels:
      None

      Description

      KITE-898 included a fix for using Crunch's support for adding jars to the distributed cache. Job submission using the local job runner was failing because jar paths were having the FS scheme replaced by the one from the default FS. The exact bug was unknown.

      As a work-around, Kite submitted local-to-HDFS jobs with the local FS instead of HDFS as the default FS. However, on more recent versions this causes the following problem:

      [root@nightly-1 ~]# sudo -u hive JAVA_HOME=/usr/java/jdk1.7.0_67-cloudera/ ./kite-dataset -v csv-import test.csv test                                                                                                                                                                       
      1 job failure(s) occurred:
      org.kitesdk.tools.CopyTask: Kite(dataset:file:/tmp/cf635231-24b2-4c0f-8548-1a8b8ae663... ID=1 (1/1)(1): java.lang.IllegalArgumentException: Wrong FS: hdfs://nightly-1.vpc.cloudera.com:8020/user/yarn/mapreduce/mr-framework/2.6.0-cdh5.7.0-777/mr-framework.tar.gz, expected: file:///
              at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:648)
              at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:468)
              at org.apache.hadoop.fs.FilterFileSystem.makeQualified(FilterFileSystem.java:119)
              at org.apache.hadoop.mapreduce.JobSubmitter.addMRFrameworkToDistributedCache(JobSubmitter.java:458)
              at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:146)
              at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1307)
              at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1304)
              at java.security.AccessController.doPrivileged(Native Method)
              at javax.security.auth.Subject.doAs(Subject.java:415)
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
              at org.apache.hadoop.mapreduce.Job.submit(Job.java:1304)
              at org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob.submit(CrunchControlledJob.java:329)
              at org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchJobControl.startReadyJobs(CrunchJobControl.java:204)
              at org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchJobControl.pollJobStatusAndStartNewOnes(CrunchJobControl.java:238)
              at org.apache.crunch.impl.mr.exec.MRExecutor.monitorLoop(MRExecutor.java:112)
              at org.apache.crunch.impl.mr.exec.MRExecutor.access$000(MRExecutor.java:55)
              at org.apache.crunch.impl.mr.exec.MRExecutor$1.run(MRExecutor.java:83)
              at java.lang.Thread.run(Thread.java:745)
      

      Removing the work-around fixes this issue and the original issue that required the work-around is no longer present (works for both local-to-HDFS and HDFS-to-HDFS).

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                blue Ryan Blue
                Reporter:
                blue Ryan Blue
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: