Details
Description
Running the attached Python script from within Pig via "pig -x local reproduction.py" results in this (abridged) cryptic stack trace:
2014-04-29 14:05:57,972 [main] ERROR org.apache.pig.scripting.BoundScript - Pig pipeline failed to complete
java.util.concurrent.ExecutionException: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException: ERROR 2017: Internal error creating job configuration.
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:188)
at org.apache.pig.scripting.BoundScript.run(BoundScript.java:176)
at org.apache.pig.scripting.BoundScript.run(BoundScript.java:134)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.python.core.PyReflectedFunction._call_(PyReflectedFunction.java:186)
at org.python.core.PyReflectedFunction._call_(PyReflectedFunction.java:204)
at org.python.core.PyObject._call_(PyObject.java:387)
at org.python.core.PyObject._call_(PyObject.java:391)
at org.python.core.PyMethod._call_(PyMethod.java:109)
at org.python.pycode._pyx0.f$0(/home/bjacobs/repo/waremanpro/reproduction.py:74)
at org.python.pycode._pyx0.call_function(/home/bjacobs/repo/waremanpro/reproduction.py)
at org.python.core.PyTableCode.call(PyTableCode.java:165)
at org.python.core.PyCode.call(PyCode.java:18)
at org.python.core.Py.runCode(Py.java:1275)
at org.python.util.PythonInterpreter.execfile(PythonInterpreter.java:235)
at org.apache.pig.scripting.jython.JythonScriptEngine$Interpreter.execfile(JythonScriptEngine.java:217)
at org.apache.pig.scripting.jython.JythonScriptEngine.load(JythonScriptEngine.java:438)
at org.apache.pig.scripting.jython.JythonScriptEngine.main(JythonScriptEngine.java:422)
at org.apache.pig.scripting.ScriptEngine.run(ScriptEngine.java:302)
at org.apache.pig.Main.runEmbeddedScript(Main.java:1020)
at org.apache.pig.Main.run(Main.java:561)
at org.apache.pig.Main.main(Main.java:156)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException: ERROR 2017: Internal error creating job configuration.
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:307)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:190)
at org.apache.pig.PigServer.launchPlan(PigServer.java:1322)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1307)
at org.apache.pig.PigServer.execute(PigServer.java:1297)
at org.apache.pig.PigServer.executeBatch(PigServer.java:375)
at org.apache.pig.PigServer.executeBatch(PigServer.java:353)
at org.apache.pig.scripting.BoundScript$MyCallable.call(BoundScript.java:346)
at org.apache.pig.scripting.BoundScript$MyCallable.call(BoundScript.java:318)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.NullPointerException
at org.apache.pig.tools.pigstats.ScriptState.getScriptHash(ScriptState.java:362)
at org.apache.pig.tools.pigstats.ScriptState.addSettingsToConf(ScriptState.java:301)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:462)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:298)
This happens so long as there is more than one job executing at a time via run (not runSingle).
The relevant line refers to a function, getScriptHash, which only exists in CDH and not in the upstream Apache Pig. This function, introduced in the blanket commit f57d1992b77f96a6d70b9b2a0fcd801c01e74bbc, apparently supports a Cloudera Navigator pipeline exploration feature.
Making a one-line change like the attached patch fixes the issue. I do not know what it breaks in Cloudera Navigator, but it seems evident to me that the attached Python code should work in CDH5 as it works in Apache Pig.