Uploaded image for project: 'CDH (READ-ONLY)'
  1. CDH (READ-ONLY)
  2. DISTRO-586

CDH version of Pig does not run multiple jobs simultaneously from Python

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: CDH5.0.0
    • Fix Version/s: None
    • Component/s: Pig
    • Labels:

      Description

      Running the attached Python script from within Pig via "pig -x local reproduction.py" results in this (abridged) cryptic stack trace:

      2014-04-29 14:05:57,972 [main] ERROR org.apache.pig.scripting.BoundScript - Pig pipeline failed to complete
      java.util.concurrent.ExecutionException: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException: ERROR 2017: Internal error creating job configuration.
      at java.util.concurrent.FutureTask.report(FutureTask.java:122)
      at java.util.concurrent.FutureTask.get(FutureTask.java:188)
      at org.apache.pig.scripting.BoundScript.run(BoundScript.java:176)
      at org.apache.pig.scripting.BoundScript.run(BoundScript.java:134)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:606)
      at org.python.core.PyReflectedFunction._call_(PyReflectedFunction.java:186)
      at org.python.core.PyReflectedFunction._call_(PyReflectedFunction.java:204)
      at org.python.core.PyObject._call_(PyObject.java:387)
      at org.python.core.PyObject._call_(PyObject.java:391)
      at org.python.core.PyMethod._call_(PyMethod.java:109)
      at org.python.pycode._pyx0.f$0(/home/bjacobs/repo/waremanpro/reproduction.py:74)
      at org.python.pycode._pyx0.call_function(/home/bjacobs/repo/waremanpro/reproduction.py)
      at org.python.core.PyTableCode.call(PyTableCode.java:165)
      at org.python.core.PyCode.call(PyCode.java:18)
      at org.python.core.Py.runCode(Py.java:1275)
      at org.python.util.PythonInterpreter.execfile(PythonInterpreter.java:235)
      at org.apache.pig.scripting.jython.JythonScriptEngine$Interpreter.execfile(JythonScriptEngine.java:217)
      at org.apache.pig.scripting.jython.JythonScriptEngine.load(JythonScriptEngine.java:438)
      at org.apache.pig.scripting.jython.JythonScriptEngine.main(JythonScriptEngine.java:422)
      at org.apache.pig.scripting.ScriptEngine.run(ScriptEngine.java:302)
      at org.apache.pig.Main.runEmbeddedScript(Main.java:1020)
      at org.apache.pig.Main.run(Main.java:561)
      at org.apache.pig.Main.main(Main.java:156)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:606)
      at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
      Caused by: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException: ERROR 2017: Internal error creating job configuration.
      at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:307)
      at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:190)
      at org.apache.pig.PigServer.launchPlan(PigServer.java:1322)
      at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1307)
      at org.apache.pig.PigServer.execute(PigServer.java:1297)
      at org.apache.pig.PigServer.executeBatch(PigServer.java:375)
      at org.apache.pig.PigServer.executeBatch(PigServer.java:353)
      at org.apache.pig.scripting.BoundScript$MyCallable.call(BoundScript.java:346)
      at org.apache.pig.scripting.BoundScript$MyCallable.call(BoundScript.java:318)
      at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      at java.lang.Thread.run(Thread.java:744)
      Caused by: java.lang.NullPointerException
      at org.apache.pig.tools.pigstats.ScriptState.getScriptHash(ScriptState.java:362)
      at org.apache.pig.tools.pigstats.ScriptState.addSettingsToConf(ScriptState.java:301)
      at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:462)
      at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:298)

      This happens so long as there is more than one job executing at a time via run (not runSingle).

      The relevant line refers to a function, getScriptHash, which only exists in CDH and not in the upstream Apache Pig. This function, introduced in the blanket commit f57d1992b77f96a6d70b9b2a0fcd801c01e74bbc, apparently supports a Cloudera Navigator pipeline exploration feature.

      Making a one-line change like the attached patch fixes the issue. I do not know what it breaks in Cloudera Navigator, but it seems evident to me that the attached Python code should work in CDH5 as it works in Apache Pig.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              bjacobs Bryan Jacobs
            • Votes:
              1 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: