Details
Description
We have noticed that while flume agent is running the disk space consumption keeps growing. If you stop the flume agent, suddenly lot of disk space is freed. We have observed this in our Amazon EC2 instance. We were using tailSource and autoCollectorSink with compressed output to true and output format set to raw.
Here is what Dan (danieltm@gmail.com) posted in the mailing list if this helps:
If it helps, I saw this behavior as well over the weekend. Both the
agent processes I was running crashed over the weekend because they
had too many open files - both had ~950 open "(deleted)" files.
sample lsof output (there's a lot of these):
java 6899 flume 44w REG 8,5
18129953 196636 /tmp/flume/agent/hostname/done/log.
00000023.20100920-130741010+0000.50686619333190425.seq (deleted)
Config-wise, flume-site.xml has collector.output.format set to
default, collector.dfs.compress.gzip set to true, and the
flume.master.server is set. Everything else is default.