Details
-
Type: Bug
-
Status: Resolved
-
Priority: Major
-
Resolution: Fixed
-
Affects Version/s: 1.1.0
-
Fix Version/s: 1.2.0
-
Component/s: Data Module
-
Labels:None
Description
KITE-898 included a fix for using Crunch's support for adding jars to the distributed cache. Job submission using the local job runner was failing because jar paths were having the FS scheme replaced by the one from the default FS. The exact bug was unknown.
As a work-around, Kite submitted local-to-HDFS jobs with the local FS instead of HDFS as the default FS. However, on more recent versions this causes the following problem:
[root@nightly-1 ~]# sudo -u hive JAVA_HOME=/usr/java/jdk1.7.0_67-cloudera/ ./kite-dataset -v csv-import test.csv test 1 job failure(s) occurred: org.kitesdk.tools.CopyTask: Kite(dataset:file:/tmp/cf635231-24b2-4c0f-8548-1a8b8ae663... ID=1 (1/1)(1): java.lang.IllegalArgumentException: Wrong FS: hdfs://nightly-1.vpc.cloudera.com:8020/user/yarn/mapreduce/mr-framework/2.6.0-cdh5.7.0-777/mr-framework.tar.gz, expected: file:/// at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:648) at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:468) at org.apache.hadoop.fs.FilterFileSystem.makeQualified(FilterFileSystem.java:119) at org.apache.hadoop.mapreduce.JobSubmitter.addMRFrameworkToDistributedCache(JobSubmitter.java:458) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:146) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1307) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1304) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1304) at org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob.submit(CrunchControlledJob.java:329) at org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchJobControl.startReadyJobs(CrunchJobControl.java:204) at org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchJobControl.pollJobStatusAndStartNewOnes(CrunchJobControl.java:238) at org.apache.crunch.impl.mr.exec.MRExecutor.monitorLoop(MRExecutor.java:112) at org.apache.crunch.impl.mr.exec.MRExecutor.access$000(MRExecutor.java:55) at org.apache.crunch.impl.mr.exec.MRExecutor$1.run(MRExecutor.java:83) at java.lang.Thread.run(Thread.java:745)
Removing the work-around fixes this issue and the original issue that required the work-around is no longer present (works for both local-to-HDFS and HDFS-to-HDFS).
Attachments
Issue Links
- links to