Uploaded image for project: 'CDH (READ-ONLY)'
  1. CDH (READ-ONLY)
  2. DISTRO-356

Cloudera Manager 3.7 manages 'Balancer Threshold' incorrectly

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: CDH3u2
    • Fix Version/s: None
    • Component/s: HDFS
    • Labels:
      None
    • Environment:
      CDH3u2 with Cloudera Manager/SCM 3.7

      Description

      Hello,

      I'm trying out Cloudera Manager 3.7, and have found an issue with the "balancer threshold" configuration setting. The SCM interface is off by two orders of magnitude on this value, a threshold of '5' is 5% difference between all nodes, but SCM thinks a threshold of "0.05" corresponds to a 5% threshold, which is incorrect. Through the UI I tried to set my threshold for balancer to 5% which is "0.05" in the UI, and the UI prevents any values above 1 be used. The balancer process then errors out on an OOM condition every time the balancer is run (I've tried up to 8G heapsize):

      2011-12-14 00:56:03,013 WARN org.apache.hadoop.hdfs.server.balancer.Balancer: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.OutOfMemoryError: unable to create new native thread
      at java.lang.Thread.start0(Native Method)
      at java.lang.Thread.start(Thread.java:640)
      at java.lang.UNIXProcess$1.run(UNIXProcess.java:141)
      at java.security.AccessController.doPrivileged(Native Method)
      at java.lang.UNIXProcess.<init>(UNIXProcess.java:103)
      at java.lang.ProcessImpl.start(ProcessImpl.java:65)
      at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
      at org.apache.hadoop.util.Shell.runCommand(Shell.java:200)
      at org.apache.hadoop.util.Shell.run(Shell.java:182)
      at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:375)
      at org.apache.hadoop.util.Shell.execCommand(Shell.java:461)
      at org.apache.hadoop.util.Shell.execCommand(Shell.java:444)
      at org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getUnixGroups(ShellBasedUnixGroupsMapping.java:66)
      at org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getGroups(ShellBasedUnixGroupsMapping.java:43)
      at org.apache.hadoop.security.Groups.getGroups(Groups.java:79)
      at org.apache.hadoop.security.UserGroupInformation.getGroupNames(UserGroupInformation.java:1034)
      at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.<init>(FSPermissionChecker.java:50)
      at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkSuperuserPrivilege(FSPermissionChecker.java:71)
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkSuperuserPrivilege(FSNamesystem.java:5151)
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlocks(FSNamesystem.java:719)
      at org.apache.hadoop.hdfs.server.namenode.NameNode.getBlocks(NameNode.java:527)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
      at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434)
      at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1430)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:396)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
      at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1428)
      at org.apache.hadoop.ipc.Client.call(Client.java:1107)
      at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
      at $Proxy0.getBlocks(Unknown Source)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
      at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
      at $Proxy0.getBlocks(Unknown Source)
      at org.apache.hadoop.hdfs.server.balancer.Balancer$Source.getBlockList(Balancer.java:639)
      at org.apache.hadoop.hdfs.server.balancer.Balancer$Source.dispatchBlocks(Balancer.java:763)
      at org.apache.hadoop.hdfs.server.balancer.Balancer$Source.access$2300(Balancer.java:597)
      at org.apache.hadoop.hdfs.server.balancer.Balancer$Source$BlockMoveDispatcher.run(Balancer.java:603)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
      at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
      at java.util.concurrent.FutureTask.run(FutureTask.java:138)
      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      at java.lang.Thread.run(Thread.java:662)

      The command executed was:
      exec /usr/lib/hadoop/bin/hadoop --config /var/run/cloudera-scm-agent/process/119-hdfs-BALANCER balancer -threshold 0.05

      The root cause of this is that a threshold of 0.05 is simply too small, and the way SCM is handling threshold is off 2 orders of magnitude. SCM prevents you from specifying a threshold >1% as it clamps between 0 and 1, when it should clamp 0 to 100 (or 1 to 100). The balancer is well documented to

      So what should have been executed to balance with a 5% threshold is:
      exec /usr/lib/hadoop/bin/hadoop --config /var/run/cloudera-scm-agent/process/119-hdfs-BALANCER balancer -threshold 5

      Default threshold is 10 not 0.1, all this is very explicitly documented: https://issues.apache.org/jira/secure/attachment/12368261/RebalanceDesign6.pdf

      In Summary: Cloudera Manager incorrectly configures/manages Balancer Threshold.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              jre J. Ryan Earl
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: