Details
Description
Flume tails a file that is encoded in UTF-8, opening the file shows me ä,ö,ü and others characters. When I open the seq files in Hadoop, which were transmitted and stored by flume through the collectorSink in raw format, all special characters like ä,ö,ü are broken like ä — it seems somewhere might be a change between UTF-8 and another encoding or is the raw output format the problem?
From Jon:
"I think the bug in TailSource – it reads lines using a method (readLine) which does character set interpretation."
Original discussion:
https://groups.google.com/a/cloudera.org/group/flume-user/browse_thread/thread/20231a0f98569d8a#
Attachments
Issue Links
- relates to
-
FLUME-252 Update Tail to get rid of races and truncation problems.
-
- Closed
-