Uploaded image for project: 'CDH (READ-ONLY)'
  1. CDH (READ-ONLY)
  2. DISTRO-614

MapReduce (v1) jobs using just 1 map-slot

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: CDH 5.0.2
    • Fix Version/s: None
    • Component/s: MapReduce
    • Labels:
      None
    • Environment:
      CentOS 6.5

      Description

      Hello,
      we've got this strange problem with mapreduce v1 running jobs with only one mapper (1 map slot). This can be overriden by providing a xml-config with following content

      <?xml version="1.0" encoding="UTF-8"?>
      <configuration>
      <property>
      <name>mapred.max.split.size</name>
      <value>100000000</value>
      </property>
      </configuration>

      or by setting that property using the "MapReduce Client Advanced Configuration Snippet (Safety Valve) for mapred-site.xml" option in Cloudera Manager (in the MR -> Gateway -> Advanced settings).

      I've tried to compare the two Job Configurations - one with this config snippet on and another one without it and it seems that it affects the mapred.map.tasks option.

      Also, the mapred.max.split.size property (which is the one I'm overriding) should be equivalent to mapreduce.input.fileinputformat.split.maxsize according to this page:
      http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-common/DeprecatedProperties.html
      but it seems that in CDH 5.0.2 it isn't.

      This is what baffles me ("bad" file contains the default job configuration, "good" file contains my override mapred.max.split.size):

      $ grep mapred.max.split.size bad good
      bad:mapred.max.split.size 100000000
      good:mapred.max.split.size 100000000

      $ grep mapreduce.input.fileinputformat.split.maxsize bad good
      bad:mapreduce.input.fileinputformat.split.maxsize 1000000
      good:mapreduce.input.fileinputformat.split.maxsize 1000000

      $ grep mapred.map.tasks bad good
      bad:mapred.map.tasks.speculative.execution false
      bad:mapred.map.tasks 1
      good:mapred.map.tasks.speculative.execution false
      good:mapred.map.tasks 4260

      Note that it appears as if my override didn't change anything at first but it affected mapred.map.tasks for some reason.

      Is this a bug or am I doing something wrong? Is there a way to properly configure MR to run jobs with more map-slots without this hack?

      Thanks!

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              dwatzke David Watzke
            • Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: