[DISTRO-412] ZKFC: Exception handling the winning of election - Cloudera Open Source

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Cannot Reproduce
Affects Version/s: CDH4.0.0
Fix Version/s: None
Component/s: HDFS
Labels:
None
Environment:
CentOS 6.2

Description

After fixing an edit log corruption in a HA setup (due to HDFS-3626), the zkfc failed to elect a master, resulting in two standby NNs.

Got the following exceptions in the hadoop-cmf-hdfs1-FAILOVERCONTROLLER-hadoop-106.log.out:

2012-07-11 11:52:36,768 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server hadoop-106/10.196.68.149:2181, sessionid = 0x138756f4e7a0000, negotiated timeout = 5000
2012-07-11 11:52:36,771 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session connected.
2012-07-11 11:52:36,776 INFO org.apache.hadoop.ha.ActiveStandbyElector: Checking for any old active which needs to be fenced...
2012-07-11 11:52:36,785 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of election
java.lang.NullPointerException
    at org.apache.hadoop.util.StringUtils.byteToHexString(StringUtils.java:171)
    at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:855)
    at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:760)
    at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:407)
    at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:609)
    at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497)
2012-07-11 11:52:36,785 INFO org.apache.hadoop.ha.ActiveStandbyElector: Trying to re-establish ZK session
2012-07-11 11:52:36,790 INFO org.apache.zookeeper.ZooKeeper: Session: 0x138756f4e7a0000 closed

And in hadoop-cmf-hdfs1-FAILOVERCONTROLLER-hadoop-108.log.out:

2012-07-11 11:52:36,719 INFO org.apache.hadoop.ha.ZKFailoverController: Local service NameNode at hadoop-108/10.196.68.150:8020 entered state: SERVICE_HEALTHY
2012-07-11 11:52:36,739 INFO org.apache.hadoop.ha.ActiveStandbyElector: Checking for any old active which needs to be fenced...
2012-07-11 11:52:36,754 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of election
java.lang.NullPointerException
        at org.apache.hadoop.util.StringUtils.byteToHexString(StringUtils.java:171)
        at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:855)
        at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:760)
        at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:407)
        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:609)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497)
2012-07-11 11:52:36,755 INFO org.apache.hadoop.ha.ActiveStandbyElector: Trying to re-establish ZK session
2012-07-11 11:52:36,760 INFO org.apache.zookeeper.ZooKeeper: Session: 0x238756f4e850000 closed

I was able to get out of this No Brain Syndrome by doing a hdfs zkfc -formatZK

Attachments

Options
- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

Attachments

zookeeper-cmf-zookeeper1-SERVER-hadoop-108.log
632 kB
19/Jul/12 10:00 AM
zookeeper-cmf-zookeeper1-SERVER-hadoop-106.log
655 kB
19/Jul/12 9:59 AM
failover-108-20120701_0820_0830.log
26 kB
19/Jul/12 9:50 AM
failover_106.log
744 kB
14/Jul/12 10:43 AM

Activity

People

Assignee:

Todd Lipcon

Reporter:

Joris Bontje

Votes:

0 Vote for this issue

Watchers:

4 Start watching this issue

Dates

Created:

11/Jul/12 1:36 PM

Updated:

21/Sep/15 11:54 PM

Resolved:

21/Sep/15 11:54 PM