Uploaded image for project: 'Flume (READ-ONLY)'
  1. Flume (READ-ONLY)
  2. FLUME-205

TailSource reads lines using a method(readLine) which does character set interpretation and that breaks all my UTF-8 characters

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: v0.9.1
    • Fix Version/s: v0.9.2
    • Component/s: Node
    • Labels:
      None
    • Environment:
      Debian Lenny, with all files in UTF-8 Encoding

      Description

      Flume tails a file that is encoded in UTF-8, opening the file shows me ä,ö,ü and others characters. When I open the seq files in Hadoop, which were transmitted and stored by flume through the collectorSink in raw format, all special characters like ä,ö,ü are broken like ä — it seems somewhere might be a change between UTF-8 and another encoding or is the raw output format the problem?

      From Jon:
      "I think the bug in TailSource – it reads lines using a method (readLine) which does character set interpretation."

      Original discussion:
      https://groups.google.com/a/cloudera.org/group/flume-user/browse_thread/thread/20231a0f98569d8a#

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                jon Jonathan Hsieh
                Reporter:
                dboek Daniel Boekhoff
              • Votes:
                0 Vote for this issue
                Watchers:
                0 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: