Uploaded image for project: 'Kite SDK (READ-ONLY)'
  1. Kite SDK (READ-ONLY)
  2. KITE-228

readSequenceFile command should not reuse the identity of Hadoop Writeable objects

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.6.0, 0.7.0, 0.8.0, 0.8.1
    • Fix Version/s: 0.9.0
    • Component/s: Morphlines Module
    • Labels:
      None

      Description

      The readSequenceFile morphline command should not reuse the "key" and "value" Hadoop Writeable objects across rows.

      Downstream commands such as loadSolr or HBase indexer buffer up a bunch of records before sending them off to Solr. If the buffered records contain a reference to the same Hadoop Writeable object as the primary key id, this leads to nonsensical behaviour as all the records suddently appear to be the same record (same id).

      A work-around is to insert the commands

      toString { field: key }
      toString { field : value } 
      

      immediately after the readSequenceFile command in your morphline. This converts the key and value from the Hadoop Writable to a distinct String object, which means the identity of the key and object are different for each row.

        Attachments

          Activity

            People

            • Assignee:
              whoschek Wolfgang Hoschek
              Reporter:
              whoschek Wolfgang Hoschek
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: