[KITE-1073] Copy or import job submission fails due to wrong default FS - Cloudera Open Source

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.1.0
Fix Version/s: 1.2.0
Component/s: Data Module
Labels:
None

Description

~~KITE-898~~ included a fix for using Crunch's support for adding jars to the distributed cache. Job submission using the local job runner was failing because jar paths were having the FS scheme replaced by the one from the default FS. The exact bug was unknown.

As a work-around, Kite submitted local-to-HDFS jobs with the local FS instead of HDFS as the default FS. However, on more recent versions this causes the following problem:

[root@nightly-1 ~]# sudo -u hive JAVA_HOME=/usr/java/jdk1.7.0_67-cloudera/ ./kite-dataset -v csv-import test.csv test                                                                                                                                                                       
1 job failure(s) occurred:
org.kitesdk.tools.CopyTask: Kite(dataset:file:/tmp/cf635231-24b2-4c0f-8548-1a8b8ae663... ID=1 (1/1)(1): java.lang.IllegalArgumentException: Wrong FS: hdfs://nightly-1.vpc.cloudera.com:8020/user/yarn/mapreduce/mr-framework/2.6.0-cdh5.7.0-777/mr-framework.tar.gz, expected: file:///
        at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:648)
        at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:468)
        at org.apache.hadoop.fs.FilterFileSystem.makeQualified(FilterFileSystem.java:119)
        at org.apache.hadoop.mapreduce.JobSubmitter.addMRFrameworkToDistributedCache(JobSubmitter.java:458)
        at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:146)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1307)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1304)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:1304)
        at org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob.submit(CrunchControlledJob.java:329)
        at org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchJobControl.startReadyJobs(CrunchJobControl.java:204)
        at org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchJobControl.pollJobStatusAndStartNewOnes(CrunchJobControl.java:238)
        at org.apache.crunch.impl.mr.exec.MRExecutor.monitorLoop(MRExecutor.java:112)
        at org.apache.crunch.impl.mr.exec.MRExecutor.access$000(MRExecutor.java:55)
        at org.apache.crunch.impl.mr.exec.MRExecutor$1.run(MRExecutor.java:83)
        at java.lang.Thread.run(Thread.java:745)

Removing the work-around fixes this issue and the original issue that required the work-around is no longer present (works for both local-to-HDFS and HDFS-to-HDFS).

Attachments

Issue Links

links to

PR #414

Activity

People

Assignee:

Ryan Blue

Reporter:

Ryan Blue

Votes:

0 Vote for this issue

Watchers:

1 Start watching this issue

Dates

Created:

16/Sep/15 8:46 PM

Updated:

17/Sep/15 5:45 PM

Resolved:

17/Sep/15 5:45 PM