Details
-
Type: Bug
-
Status: Open
-
Priority: Major
-
Resolution: Unresolved
-
Affects Version/s: v0.9.3
-
Fix Version/s: None
-
Component/s: Master, Node, Sinks+Sources
-
Environment:Ubuntu 10.10 Maverick Meerkat
Description
You can reproduce this problem by following these steps:
Set up:
- Master
- Agent: rpcSource(35092) | agent*(...) # agent*Sink and agent*Chain all have this problem
- Collector: collectorSource(...) | collectorSink(...)
Start sending events to the agent using Thrift. Then use the flume shell on master to configure the agent – you can even use the exact same config as the agent had in the first place. Make sure the agent receives this configuration while still being sent events. After the agent receives its configuration, it will close its source server for some reason and thereafter become unresponsive to new configurations. This is the sample output from the agent logs:
2011-06-15 07:29:04,086 INFO com.cloudera.flume.handlers.thrift.ThriftEventSink: ThriftEventSink on port 35853 closed
2011-06-15 07:29:05,088 INFO com.cloudera.flume.handlers.thrift.ThriftEventSource: Closed server on port 35092...
2011-06-15 07:29:05,088 INFO com.cloudera.flume.handlers.thrift.ThriftEventSource: Queue still has 4 elements ...
And of course, the fact that the server is closed results in lots of the following types of errors in the application that's sending events:
Thrift::TransportException: Broken pipe
Thrift::TransportException: Could not connect to localhost:35092: Connection refused - connect(2)
Another variation to reproduce this type of error is to bring the master down, then bring it back up, at which point it will send its configuration to the agent node. Upon receiving the new configuration, the agent closes its source server and becomes unresponsive to new configurations. The following is output from an agent that was configured with two logical nodes, one that was rpcSource(35090) | agentE2EChain(...) and one that was rpcSource(35092) | agentBEChain(...)
2011-06-15 05:37:46,731 INFO com.cloudera.flume.agent.ThriftMasterRPC: Connected to master at flume-master:35872
2011-06-15 05:37:51,770 INFO com.cloudera.flume.handlers.thrift.ThriftEventSource: Closed server on port 35090...
2011-06-15 05:37:51,771 INFO com.cloudera.flume.handlers.thrift.ThriftEventSource: Queue still has 0 elements ...
2011-06-15 05:37:51,787 INFO com.cloudera.flume.handlers.thrift.ThriftEventSink: ThriftEventSink on port 35853 closed
2011-06-15 05:37:51,868 INFO com.cloudera.flume.handlers.thrift.ThriftEventSource: Closed server on port 35090...
2011-06-15 05:37:51,868 INFO com.cloudera.flume.handlers.thrift.ThriftEventSource: Queue still has 0 elements ...
2011-06-15 05:37:51,868 WARN com.cloudera.flume.handlers.debug.LazyOpenDecorator: Closing a lazy sink that was not logically opened
2011-06-15 05:37:51,868 INFO com.cloudera.flume.agent.LogicalNode: flume-agent: Connector stopped: LazyOpenSource | LazyOpenDecorator
2011-06-15 05:37:51,875 INFO com.cloudera.flume.agent.LogicalNode: Node config successfully set to com.cloudera.flume.conf.FlumeConfigData@42143753
2011-06-15 05:37:51,880 INFO com.cloudera.flume.agent.LogicalNode: Connector started: LazyOpenSource | LazyOpenDecorator
2011-06-15 05:37:51,881 INFO com.cloudera.flume.handlers.thrift.ThriftEventSource: Starting blocking thread pool server on port 35090...
2011-06-15 05:37:52,788 INFO com.cloudera.flume.handlers.thrift.ThriftEventSource: Closed server on port 35092...
2011-06-15 05:37:52,788 INFO com.cloudera.flume.handlers.thrift.ThriftEventSource: Queue still has 6 elements ...
I once produced an exception using this master-down/master-up procedure:
2011-06-15 04:50:45,543 ERROR com.cloudera.flume.core.connector.DirectDriver: Driving src/sink failed! LazyOpenSource | LazyOpenDecorator because NaiveFileWALDeco not open for append
java.lang.IllegalStateException: NaiveFileWALDeco not open for append
at com.google.common.base.Preconditions.checkState(Preconditions.java:145)
at com.cloudera.flume.agent.durability.NaiveFileWALDeco.append(NaiveFileWALDeco.java:133)
at com.cloudera.flume.core.CompositeSink.append(CompositeSink.java:61)
at com.cloudera.flume.agent.AgentFailChainSink.append(AgentFailChainSink.java:103)
at com.cloudera.flume.core.EventSinkDecorator.append(EventSinkDecorator.java:60)
at com.cloudera.flume.handlers.debug.LazyOpenDecorator.append(LazyOpenDecorator.java:75)
at com.cloudera.flume.core.connector.DirectDriver$PumperThread.run(DirectDriver.java:93)
2011-06-15 04:50:45,544 INFO com.cloudera.flume.agent.LogicalNode: Connector xxxxxxxx.internal-E2E exited with error NaiveFileWALDeco not open for append
2011-06-15 04:50:46,544 INFO com.cloudera.flume.handlers.thrift.ThriftEventSource: Closed server on port 35090...
2011-06-15 04:50:46,545 INFO com.cloudera.flume.handlers.thrift.ThriftEventSource: Queue still has 6 elements ...
2011-06-15 04:50:50,443 INFO com.cloudera.flume.agent.AgentFailChainSink: Setting e2e failover chain to { ackedWriteAhead => { stubbornAppend =>
} }
2011-06-15 04:50:50,443 INFO com.cloudera.flume.agent.AgentFailChainSink: Setting failover chain to { ackedWriteAhead => { stubbornAppend =>
} }