Uploaded image for project: 'CDH (READ-ONLY)'
  1. CDH (READ-ONLY)
  2. DISTRO-412

ZKFC: Exception handling the winning of election

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Cannot Reproduce
    • Affects Version/s: CDH4.0.0
    • Fix Version/s: None
    • Component/s: HDFS
    • Labels:
      None
    • Environment:
      CentOS 6.2

      Description

      After fixing an edit log corruption in a HA setup (due to HDFS-3626), the zkfc failed to elect a master, resulting in two standby NNs.

      Got the following exceptions in the hadoop-cmf-hdfs1-FAILOVERCONTROLLER-hadoop-106.log.out:

      2012-07-11 11:52:36,768 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server hadoop-106/10.196.68.149:2181, sessionid = 0x138756f4e7a0000, negotiated timeout = 5000
      2012-07-11 11:52:36,771 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session connected.
      2012-07-11 11:52:36,776 INFO org.apache.hadoop.ha.ActiveStandbyElector: Checking for any old active which needs to be fenced...
      2012-07-11 11:52:36,785 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of election
      java.lang.NullPointerException
          at org.apache.hadoop.util.StringUtils.byteToHexString(StringUtils.java:171)
          at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:855)
          at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:760)
          at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:407)
          at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:609)
          at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497)
      2012-07-11 11:52:36,785 INFO org.apache.hadoop.ha.ActiveStandbyElector: Trying to re-establish ZK session
      2012-07-11 11:52:36,790 INFO org.apache.zookeeper.ZooKeeper: Session: 0x138756f4e7a0000 closed
      

      And in hadoop-cmf-hdfs1-FAILOVERCONTROLLER-hadoop-108.log.out:

      2012-07-11 11:52:36,719 INFO org.apache.hadoop.ha.ZKFailoverController: Local service NameNode at hadoop-108/10.196.68.150:8020 entered state: SERVICE_HEALTHY
      2012-07-11 11:52:36,739 INFO org.apache.hadoop.ha.ActiveStandbyElector: Checking for any old active which needs to be fenced...
      2012-07-11 11:52:36,754 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of election
      java.lang.NullPointerException
              at org.apache.hadoop.util.StringUtils.byteToHexString(StringUtils.java:171)
              at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:855)
              at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:760)
              at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:407)
              at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:609)
              at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497)
      2012-07-11 11:52:36,755 INFO org.apache.hadoop.ha.ActiveStandbyElector: Trying to re-establish ZK session
      2012-07-11 11:52:36,760 INFO org.apache.zookeeper.ZooKeeper: Session: 0x238756f4e850000 closed
      

      I was able to get out of this No Brain Syndrome by doing a hdfs zkfc -formatZK

        Attachments

          Activity

            People

            • Assignee:
              todd Todd Lipcon
              Reporter:
              jbontje Joris Bontje
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: