Details
-
Type:
Bug
-
Status: Resolved
-
Priority:
Major
-
Resolution: Cannot Reproduce
-
Affects Version/s: CDH4.0.0
-
Fix Version/s: None
-
Component/s: HDFS
-
Labels:None
-
Environment:CentOS 6.2
Description
After fixing an edit log corruption in a HA setup (due to HDFS-3626), the zkfc failed to elect a master, resulting in two standby NNs.
Got the following exceptions in the hadoop-cmf-hdfs1-FAILOVERCONTROLLER-hadoop-106.log.out:
2012-07-11 11:52:36,768 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server hadoop-106/10.196.68.149:2181, sessionid = 0x138756f4e7a0000, negotiated timeout = 5000 2012-07-11 11:52:36,771 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session connected. 2012-07-11 11:52:36,776 INFO org.apache.hadoop.ha.ActiveStandbyElector: Checking for any old active which needs to be fenced... 2012-07-11 11:52:36,785 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of election java.lang.NullPointerException at org.apache.hadoop.util.StringUtils.byteToHexString(StringUtils.java:171) at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:855) at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:760) at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:407) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:609) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497) 2012-07-11 11:52:36,785 INFO org.apache.hadoop.ha.ActiveStandbyElector: Trying to re-establish ZK session 2012-07-11 11:52:36,790 INFO org.apache.zookeeper.ZooKeeper: Session: 0x138756f4e7a0000 closed
And in hadoop-cmf-hdfs1-FAILOVERCONTROLLER-hadoop-108.log.out:
2012-07-11 11:52:36,719 INFO org.apache.hadoop.ha.ZKFailoverController: Local service NameNode at hadoop-108/10.196.68.150:8020 entered state: SERVICE_HEALTHY 2012-07-11 11:52:36,739 INFO org.apache.hadoop.ha.ActiveStandbyElector: Checking for any old active which needs to be fenced... 2012-07-11 11:52:36,754 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of election java.lang.NullPointerException at org.apache.hadoop.util.StringUtils.byteToHexString(StringUtils.java:171) at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:855) at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:760) at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:407) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:609) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497) 2012-07-11 11:52:36,755 INFO org.apache.hadoop.ha.ActiveStandbyElector: Trying to re-establish ZK session 2012-07-11 11:52:36,760 INFO org.apache.zookeeper.ZooKeeper: Session: 0x238756f4e850000 closed
I was able to get out of this No Brain Syndrome by doing a hdfs zkfc -formatZK