Details
Description
This problem was found due to an uncaught configurations error in the flume-site.xml file.
Here's the root of the problem.
The property flume.master.servers takes a list of servers. As a user I assumed that these contained host port pairs that flume nodes use to talk to masters.
Example:
—
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>flume.master.servers</name>
<value>ec2-184-72-143-244.compute-1.amazonaws.com:35872</value>
<description> This is the address for the master config server's
status and configuration server (http)
</description>
</property>
</configuration>
The problem is that the name 'master:35872' gets interpreted as a master name. If flume.masters.zk.servers is not set, it takes the master server list and then appends the default zk ports ':2181:3181'. When ends up happening is that the name 'master:35872' + ':2181:3181' becomes - 'master:35872:2181:3181' which causes the FlumeMaster ZKClient to try to talk to ZK at 35872.
They key point is that the value
<value>ec2-184-72-143-244.compute-1.amazonaws.com:35872</value>
should be
<value>ec2-184-72-143-244.compute-1.amazonaws.com</value>
Some possible solutions would be to
- drop anything after a colons for entries listed in the flume.master.servers list
- outright refuse to start if flume.master.servers list is incorrect.
- add a check when getting flume.master.zk.servers and truncate there.
- Separate parameters so that ports don't have to be parsed from string arguments.