Uploaded image for project: 'CDH (READ-ONLY)'
  1. CDH (READ-ONLY)
  2. DISTRO-411

HDFS put to hdfs://namenode:8020//path causes edit log corruption

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: CDH4.0.0
    • Fix Version/s: CDH4.1.0
    • Component/s: HDFS
    • Labels:
      None
    • Environment:
      CentOS 6.2

      Description

      The following command results in a corrupt NN editlog (note the double slash and reading from stdin):
      $ cat /usr/share/dict/words | hadoop fs -put - hdfs://localhost:8020//path/file

      After this, restarting the namenode will result into the following fatal exception:

      2012-07-10 06:29:19,910 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Reading /var/lib/hadoop-hdfs/cache/hdfs/dfs/name/current/edits_0000000000000000173-0000000000000000188 expecting start txid #173
      2012-07-10 06:29:19,912 ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception on operation MkdirOp [length=0, path=/, timestamp=1341915658216, permissions=cloudera:supergroup:rwxr-xr-x, opCode=OP_MKDIR, txid=182]
      java.lang.ArrayIndexOutOfBoundsException: -1
              at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1728)
              at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1743)
              at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:1562)
              at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:1549)
              at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:377)
              at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:178)
              at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:93)
      

      This exception is triggered by the following entry in the editlog:

        <RECORD>
          <OPCODE>OP_MKDIR</OPCODE>
          <DATA>
            <TXID>182</TXID>
            <LENGTH>0</LENGTH>
            <PATH>/</PATH>
            <TIMESTAMP>1341915658216</TIMESTAMP>
            <PERMISSION_STATUS>
              <USERNAME>cloudera</USERNAME>
              <GROUPNAME>supergroup</GROUPNAME>
              <MODE>493</MODE>
            </PERMISSION_STATUS>
          </DATA>
        </RECORD>
        <RECORD>
          <OPCODE>OP_MKDIR</OPCODE>
          <DATA>
            <TXID>183</TXID>
            <LENGTH>0</LENGTH>
            <PATH>//path</PATH>
            <TIMESTAMP>1341915658216</TIMESTAMP>
            <PERMISSION_STATUS>
              <USERNAME>cloudera</USERNAME>
              <GROUPNAME>supergroup</GROUPNAME>
              <MODE>493</MODE>
            </PERMISSION_STATUS>
          </DATA>
        </RECORD>
      

      This initially happened on a clients HA setup, but I can reproduce it on a fresh CDH4 vm.

      Locally I can fix it with a hdfs namenode -recover
      Haven't yet tried fixing it on the HA setup.

        Attachments

          Activity

            People

            • Assignee:
              todd Todd Lipcon
              Reporter:
              jbontje Joris Bontje
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: