Uploaded image for project: 'CDH (READ-ONLY)'
  1. CDH (READ-ONLY)
  2. DISTRO-495

Standby namenode is stalled during checkpointing

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Not A Bug
    • Affects Version/s: CDH4.2.1
    • Fix Version/s: None
    • Component/s: HDFS
    • Labels:
      None
    • Environment:
      CDH4.2.1, HA

      Description

      During checkpointing on the standby NN, the checkpointer thread is holding onto a lock which prevents basically anything else to run.
      This is very uncool, especially because the lock is held during image compression and writeback to disk, as these operation do take a lot of time on non-trivial setups.
      As a reminder, fresh clients will connect to the standby and expect it to fail connexion or redirect them to the active NN.
      In this state, which can last for tens of seconds, the client is stalled, waiting for an answer, slowing down operations for newly-started tasks.
      JMX threaddump is attached which shows the problem.

        Attachments

          Activity

            People

            • Assignee:
              atm Aaron T. Myers
              Reporter:
              jbnote Jean-Baptiste Note
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: