Details
-
Type: Bug
-
Status: Resolved
-
Priority: Major
-
Resolution: Fixed
-
Affects Version/s: 0.9.0, 0.9.1, 0.10.0, 0.10.1
-
Fix Version/s: 0.11.0
-
Component/s: Morphlines Module
-
Labels:None
Description
The readRCFile command expects the _attachment_body morphline record field to contain a FSDataInputStream (or at least something that implements both Seekable and PositionedReadable), even though the doProcess() interface says it can be any InputStream. In practise the field contains a java.io.BufferedInputStream when used with the Cloudera Search MapReduceIndexerTool. This leads to exceptions like this:
18382 [main] ERROR org.apache.solr.hadoop.morphline.MorphlineMapRunner - Unable to process file hdfs://search-testing-c5-1.ent.cloudera.com/tmp/wolfin/testRCFileRowWise.rc
java.lang.IllegalArgumentException: In is not an instance of Seekable or PositionedReadable
at org.apache.hadoop.fs.FSDataInputStream.<init>(FSDataInputStream.java:51)
at org.kitesdk.morphline.hadoop.rcfile.SingleStreamFileSystem.<init>(SingleStreamFileSystem.java:43)
at org.kitesdk.morphline.hadoop.rcfile.ReadRCFileBuilder$ReadRCFile.doProcess(ReadRCFileBuilder.java:112)
Here is a corresponding test case: In ReadRCFileTest.java change the following method:
private InputStream readPath(final Path inputFile) throws IOException { FileSystem fs = inputFile.getFileSystem(new Configuration()); return fs.open(inputFile); }
to now read as follows:
private InputStream readPath(final Path inputFile) throws IOException { FileSystem fs = inputFile.getFileSystem(new Configuration()); return new java.io.BufferedInputStream(fs.open(inputFile)); }
With this change the ReadRCFileTest unit tests will fail.