Details
Description
I noticed an issue where the flume agent node is not properly closing deleted files all the time. This has happened multiple times on two different nodes. The node is not always closing 'done' log files. This caused flume on both nodes to crash due to having too many open files. Both nodes had approximately 960 open deleted files. lsof reports the following:
java 16591 flume 167r REG 8,5 19497160 1572905 /tmp/flume/agent/host/done/log.00000160.20100920-145031196+0000.50692789519690425.seq (deleted)
java 16591 flume 169w REG 8,5 19497160 1572905 /tmp/flume/agent/host/done/log.00000160.20100920-145031196+0000.50692789519690425.seq (deleted)
java 16591 flume 171w REG 8,5 19501368 1572907 /tmp/flume/agent/host/done/log.00000171.20100920-145051248+0000.50692809571710425.seq (deleted)
<ad infinitum>
flume-site.xml has the following options set:
<property>
<name>flume.master.servers</name>
<value>master</value>
</property>
<property>
<name>flume.collector.output.format</name>
<value>default</value>
</property>
<property>
<name>flume.collector.dfs.compress.gzip</name>
<value>true</value>
</property>
The collector options shouldn't have an impact as both nodes that crashed are being run only as agents (collectors are on different physical machines).