Details
-
Type:
Bug
-
Status: Resolved
-
Priority:
Major
-
Resolution: Fixed
-
Affects Version/s: 0.6.0, 0.7.0, 0.8.0, 0.8.1
-
Fix Version/s: 0.9.0
-
Component/s: Morphlines Module
-
Labels:None
Description
The readSequenceFile morphline command should not reuse the "key" and "value" Hadoop Writeable objects across rows.
Downstream commands such as loadSolr or HBase indexer buffer up a bunch of records before sending them off to Solr. If the buffered records contain a reference to the same Hadoop Writeable object as the primary key id, this leads to nonsensical behaviour as all the records suddently appear to be the same record (same id).
A work-around is to insert the commands
toString { field: key } toString { field : value }
immediately after the readSequenceFile command in your morphline. This converts the key and value from the Hadoop Writable to a distinct String object, which means the identity of the key and object are different for each row.