Details
-
Type: Bug
-
Status: Open
-
Priority: Critical
-
Resolution: Unresolved
-
Affects Version/s: 1.1.0
-
Fix Version/s: None
-
Component/s: Data Module
-
Environment:Hadoop 2.6.0, HDP2.2
Description
I have a MapReduce job to read/parse text and write its results to a hive table.
The job is configured (shortened) like this:
Configuration conf = new HiveConfiguration();
Job job = Job.getInstance(conf);
FileInputFormat.addInputPaths(job, inputPaths);
job.setInputFormatClass(TextInputFormat.class);
AvroJob.setMapOutputKeySchema(job, Schema.create(Schema.Type.LONG));
AvroJob.setMapOutputValueSchema(job, Tweet.getClassSchema());
DatasetKeyOutputFormat.ConfigBuilder configBuilder = DatasetKeyOutputFormat.configure(job);
configBuilder.overwrite("dataset:hive:mydataset");
configBuilder.withType(Tweet.class);
The job fails with the following exception:
15/07/17 00:57:56 INFO mapreduce.Job: Job job_1436989639392_0015 failed with state FAILED due to: Job setup failed : java.lang.IllegalArgumentException: Unknown repository URI pattern: dataset:hdfs://hdfs.XXX.com:8020/tmp/default/.temp/job_1436989639392_0015
at org.kitesdk.data.spi.Registration.lookupPatternByRepoUri(Registration.java:74)
at org.kitesdk.data.URIBuilder.<init>(URIBuilder.java:109)
at org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.create(FileSystemDatasetRepository.java:144)
at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.loadOrCreateJobDataset(DatasetKeyOutputFormat.java:584)
at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.access$300(DatasetKeyOutputFormat.java:67)
at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat$MergeOutputCommitter.setupJob(DatasetKeyOutputFormat.java:369)
at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobSetup(CommitterEventHandler.java:254)
at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:234)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
I tracked the stacktrace a bit down, but couldn't find where the hostname was added to this dataset string.