Uploaded image for project: 'CDH (READ-ONLY)'
  1. CDH (READ-ONLY)
  2. DISTRO-470

There is race condtion in FSEditLog when removing error edit stream

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: CDH3u5
    • Fix Version/s: CDH4.0.0
    • Component/s: HDFS
    • Labels:
      None

      Description

      In our cluster, we configure the NameNode to write to both local and NFS mounted directories. When the NFS mounted directory is inaccessible, the NameNode should keep running without error, but our NameNode crash with following stack trace.

      2013-04-02 23:35:21,536 FATAL org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Unable to find edits stream with IO error
      java.lang.Exception: Unable to find edits stream with IO error
      at org.apache.hadoop.hdfs.server.namenode.FSEditLog.fatalExit(FSEditLog.java:430)
      at org.apache.hadoop.hdfs.server.namenode.FSEditLog.removeEditsStreamsAndStorageDirs(FSEditLog.java:519)
      at org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:1139)
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:1641)
      at org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:689)
      at sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
      at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434)
      at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1430)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:396)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
      at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1428)

      According to the stack trace, When NameNode tries to sync edit log, it does identify the mounted NFS directory is inaccessible, and attempt to remove it from the FSEditLog#editStreams. However, it found the edit stream corresponding to the mounted NFS has already been removed. Under this circumstance, NameNode just kill itself, aborted!

      After looking through the source code of HDFS, I found there is another code path of removing edit stream from FSEditLog#editStreams, which can cause above race condition. In method FSEditLog#getEditLogSize

      synchronized long getEditLogSize() throws IOException
      {
      assert getNumStorageDirs() == editStreams.size();
      long size = 0;
      for (int idx = 0; idx < editStreams.size(); idx++)
      {
      EditLogOutputStream es = editStreams.get(idx);
      try

      Unknown macro: { long curSize = es.length(); assert (size == 0 || size == curSize) }

      catch (IOException ioe)

      Unknown macro: { FSNamesystem.LOG.warn( "Unable to determine edit log length. Removing log.", ioe); removeEditsAndStorageDir(idx); }

      }
      return size;
      }

      The cause of this race condition lie in FSEditLog#logSync method, there are two steps in FSEditLog#logSync

      1. Do sync operation, if any one edit stream is inaccessible, put it into error stream list.(un-synchronized)
      2. Delete error stream in above error edit stream list from FSEditLog#editStreams (synchronized)

      Step #1 isn't synchronized, so there is a possibility that after step#1 and before step #2 the error stream has already been removed from other thread by invoking FSEditLog#getEditLogSize

      From the attached NameNode log, the above analysis is exactly the case.
      The secondary NameNode try to make RPC call of NameNode#getEditLogSize which finally call into FSEditLog#getEditLogSize and remove the error edit stream.

      We can fix the bug as in apache hadoop brach 1.X done with it, just throw out exception instead of trying to remove error edit stream in FSEditLog#getEditLogSize method;the Secondary NameNode receiving this exception will just re-try.

      The fix is minor ,can I submit a patch for this.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              antyrao Anty.Rao
            • Votes:
              1 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: