Details
-
Type:
Bug
-
Status: Open
-
Priority:
Minor
-
Resolution: Unresolved
-
Affects Version/s: v0.9.3
-
Fix Version/s: None
-
Component/s: Sinks+Sources
-
Labels:None
Description
Log messages with UTF-8 Characters like äöü end up with broken in Hadoop when logging via Scribe. We used a simple Setup with:
exec config scribe_input scribe "scribe(1463) "collectorSink("hdfs://localhost/testing/", "test",1000)"
exec spawn testserver scribe_input
We usually use avrojson as collector output format and gzip for compression, but the chars are broken if we deactivate both.
The Problem seems to occur when flume writes the files into Hadoop, as in a more complicated setup like:
exec config scribe_input scribe "scribe(1463)" autoDFOChain
exec config hdfs scribe autoCollectorSource "collectorSink("hdfs://localhost/testing/", "test",1000)"
exec spawn testserver1 scribe_input
exec spawn testserver2 hdfs
the chars are still ok in the DFO Logs on testserver1