Uploaded image for project: 'CDH (READ-ONLY)'
  1. CDH (READ-ONLY)
  2. DISTRO-703

Accumulo tserver(s) unable to connect to tracer in multinode cluster

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: CDH 5.1.0
    • Fix Version/s: None
    • Component/s: Accumulo
    • Labels:
    • Environment:
      RHEL 6.5

      Description

      I installed CDH5 hadoop/accumulo/zookeeper on a 4 node cluster 1 master, 3 slaves. accumulo master, tracer, gc and monitor run on master(along with namenode) with accumulo tservers running on slaves (along with datanodes). On startup, the accumulo databse works, but the tservers continually log connection refused when attempting to connect to the accumulo tracer server and trace service does not function correctly.

      Below are excerpts from commands run on the master and 1 slave node to illustrate the issue.

      1. accumulo tracer running on accumulo master node (master):

      [root@npsmaster bin]# service accumulo-tracer status
      Accumulo Tracer is running [ OK ]

      1. Listening on port 12234 for all adapters (0.0.0.0)
        [root@npsmaster bin]# netstat -pantu |grep 12234
        tcp 0 0 0.0.0.0:12234 0.0.0.0:* LISTEN 17976/java
        tcp 0 0 192.168.63.1:50925 192.168.63.1:12234 ESTABLISHED 17875/java
        tcp 0 0 192.168.63.1:50921 192.168.63.1:12234 ESTABLISHED 17652/java
        tcp 0 0 192.168.63.1:12234 192.168.63.1:50921 ESTABLISHED 17976/java
        tcp 0 0 192.168.63.1:12234 192.168.63.1:50925 ESTABLISHED 17976/java
      1. npsmaster is configured as the tracer
        [root@npsmaster bin]# cat /etc/accumulo/conf/tracers
        npsmaster
      1. tracer info from zookeeper for accumulo instance NOTE: the trace address is 0.0.0.0 12234
        [zk: localhost:2181(CONNECTED) 2] get /accumulo/041cfce5-3198-456f-b81a-e9ed14fdcadc/tracers/trace-0000000000
        0.0.0.0:12234
        cZxid = 0x900095025
        ctime = Fri Mar 20 17:14:00 UTC 2015
        mZxid = 0x900095025
        mtime = Fri Mar 20 17:14:00 UTC 2015
        pZxid = 0x900095025
        cversion = 0
        dataVersion = 0
        aclVersion = 0
        ephemeralOwner = 0x34c13dcb155000c
        dataLength = 13
        numChildren = 0
        [zk: localhost:2181(CONNECTED) 3]
      1. tserver running on slave node
        [root@npsslave1 mira]# service accumulo-tserver status
        Accumulo Tablet Server is running [ OK ]
      1. tracer config on slave also points to npsmaster
        [root@npsslave1 mira]# cat /etc/accumulo/conf/tracers
        npsmaster
      1. No connections to port 12234!
        [root@npsslave1 mira]# netstat -pantu |grep :12234
        [root@npsslave1 mira]#
      1. Exceptions in logs occur once a second forever eventually fill up shared log volume, plus tserver is not capable of sending data to Trace Server

      2015-03-24 19:07:52,779 [receivers.SendSpansViaThrift] DEBUG: Connecting to 0.0.0.0:12234
      2015-03-24 19:07:52,780 [receivers.SendSpansViaThrift] ERROR: java.net.ConnectException: Connection refused
      java.net.ConnectException: Connection refused
      at java.net.PlainSocketImpl.socketConnect(Native Method)
      at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
      at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
      at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
      at java.net.SocksSocketImpl.connect(Unknown Source)
      at java.net.Socket.connect(Unknown Source)
      at java.net.Socket.connect(Unknown Source)
      at org.apache.accumulo.trace.instrument.receivers.SendSpansViaThrift.createDestination(SendSpansViaThrift.java:55)
      at org.apache.accumulo.trace.instrument.receivers.SendSpansViaThrift.createDestination(SendSpansViaThrift.java:34)
      at org.apache.accumulo.trace.instrument.receivers.AsyncSpanReceiver.sendSpans(AsyncSpanReceiver.java:87)
      at org.apache.accumulo.trace.instrument.receivers.AsyncSpanReceiver$1.run(AsyncSpanReceiver.java:63)
      at java.util.TimerThread.mainLoop(Unknown Source)
      at java.util.TimerThread.run(Unknown Source)

      1. Cause of problem seems to be that the a/-address parameter is not passed to the accumulo-tracer on startup - see ServerOpts class called from TraceServer.java main method. I made a pretty ugly change to the accumulo-tracer start script in /etc/init.d to pull the address from the tracers file and pass along to the accumulo Trace Server which seems to allow the trace to function and the logs to go away, however, a more elegant solution should be researched that will allow the start scripts to work without breaking the old school startup (start-all.sh) which allows for multiple tracers to peacefully coexist.

      diff accumulo-tracer accumulo-tracer.orig
      70d69
      < TRACER_HOST_FILE="$CONF_DIR/tracers"
      89,98d87
      < # find the tracer address to bind to from the tracers file
      < #
      < # first check to make sure that there is only one tracer configured
      < #
      < NUMTRACERS="$(/usr/bin/wc -l < $TRACER_HOST_FILE)"
      < if [ $NUMTRACERS -eq 1 ]
      < then
      < TRACERADDRESS="$(/bin/cat $TRACER_HOST_FILE)"
      < DAEMON_FLAGS="$DAEMON_FLAGS -a $TRACERADDRESS"
      < fi

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              brentoleary Brent O'Leary
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: