Details
-
Type:
Bug
-
Status: Open
-
Priority:
Major
-
Resolution: Unresolved
-
Affects Version/s: CDH 5.5.0
-
Fix Version/s: None
-
Labels:
-
Environment:Ubuntu server 14.04.3
Description
I hava a spark script written in pyspark and I want to submit it via oozie spark action.
something like this:
<action name="myapp"> <spark xmlns="uri:oozie:spark-action:0.1"> <job-tracker>${job_tracker}</job-tracker> <name-node>${name_node}</name-node> <master>local[*]</master> <name>myapp</name> <jar>${my_script}</jar> <spark-opts>--executor-memory 4G --num-executors 4</spark-opts> <arg>${arg1}</arg> </spark> <ok to="hive_import"/> <error to="send_email"/> </action>
The script imports pyspark module:
#!/usr/bin/spark-submit from pyspark import SparkContext from pyspark import SparkFiles sc = SparkContext()
However, the oozie will throw the " Can not import pyspark module" exception.
This happens when I upgrade to CDH 5.5.1 from CDH 5.4.6.
The workaround would be using the shell action, but I think the spark action is better to describe the spark task.
Any suggestion?