Details
Description
The following command results in a corrupt NN editlog (note the double slash and reading from stdin):
$ cat /usr/share/dict/words | hadoop fs -put - hdfs://localhost:8020//path/file
After this, restarting the namenode will result into the following fatal exception:
2012-07-10 06:29:19,910 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Reading /var/lib/hadoop-hdfs/cache/hdfs/dfs/name/current/edits_0000000000000000173-0000000000000000188 expecting start txid #173 2012-07-10 06:29:19,912 ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception on operation MkdirOp [length=0, path=/, timestamp=1341915658216, permissions=cloudera:supergroup:rwxr-xr-x, opCode=OP_MKDIR, txid=182] java.lang.ArrayIndexOutOfBoundsException: -1 at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1728) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1743) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:1562) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:1549) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:377) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:178) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:93)
This exception is triggered by the following entry in the editlog:
<RECORD> <OPCODE>OP_MKDIR</OPCODE> <DATA> <TXID>182</TXID> <LENGTH>0</LENGTH> <PATH>/</PATH> <TIMESTAMP>1341915658216</TIMESTAMP> <PERMISSION_STATUS> <USERNAME>cloudera</USERNAME> <GROUPNAME>supergroup</GROUPNAME> <MODE>493</MODE> </PERMISSION_STATUS> </DATA> </RECORD> <RECORD> <OPCODE>OP_MKDIR</OPCODE> <DATA> <TXID>183</TXID> <LENGTH>0</LENGTH> <PATH>//path</PATH> <TIMESTAMP>1341915658216</TIMESTAMP> <PERMISSION_STATUS> <USERNAME>cloudera</USERNAME> <GROUPNAME>supergroup</GROUPNAME> <MODE>493</MODE> </PERMISSION_STATUS> </DATA> </RECORD>
This initially happened on a clients HA setup, but I can reproduce it on a fresh CDH4 vm.
Locally I can fix it with a hdfs namenode -recover
Haven't yet tried fixing it on the HA setup.