Uploaded image for project: 'CDH (READ-ONLY)'
  1. CDH (READ-ONLY)
  2. DISTRO-374

Debian Wheezy cannot run MR jobs when hadoop-0.20-native_0.20.2+923.194-1~squeeze-cdh3_amd64.deb is installed and activated

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: CDH3u3
    • Fix Version/s: CDH3u6, CDH4.4.0
    • Component/s: Hadoop Common
    • Labels:
      None
    • Environment:

      Description

      When running MR jobs on Hadoop after installing CDH3u3 all map tasks fail with the following kind of error message:

      [exec] 12/02/03 09:50:58 INFO mapred.JobClient: Task Id : attempt_201202030949_0001_m_000000_0, Status : FAILED
      [exec] Map output lost, rescheduling: getMapOutput(attempt_201202030949_0001_m_000000_0,0) failed :
      [exec] EINVAL: Invalid argument
      [exec] at org.apache.hadoop.io.nativeio.NativeIO.posix_fadvise(Native Method)
      [exec] at org.apache.hadoop.io.nativeio.NativeIO.posixFadviseIfPossible(NativeIO.java:177)
      [exec] at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:4026)
      [exec] at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
      [exec] at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
      [exec] at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
      [exec] at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
      [exec] at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:829)
      [exec] at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
      [exec] at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
      [exec] at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)

      This is after installing Hadoop with the following apt-get command:

      apt-get install hadoop-0.20 hadoop-0.20-namenode hadoop-0.20-datanode hadoop-0.20-jobtracker hadoop-0.20-tasktracker

      This command picks up hadoop-0.20-native automatically so it will come up potentially for lots of users running Wheezy. While it's unsupported at the moment it probably makes sense to investigate it since other distros could be affected.

      harshj/QwertyM on IRC mentioned that this package could be the problem and suggested a workaround.

      Workarounds include:

      • Removing the native package and restarting Hadoop ("apt-get remove hadoop-0.20-native" / "stop-all.sh ; start-all.sh")
      • harshj - Setting mapred.tasktracker.shuffle.fadvise to false in mapred-site.xml

      Both workarounds work on my installation.

      This does not affect the functionality of HDFS. "hadoop fs" commands like put, ls, cat, rmr all work properly. Only MR is affected.

        Attachments

          Activity

            People

            • Assignee:
              tucu Alejandro Abdelnur
              Reporter:
              timmattison Tim Mattison
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: