Uploaded image for project: 'CDH (READ-ONLY)'
  1. CDH (READ-ONLY)
  2. DISTRO-785

spark action via oozie can not find pyspark module

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: CDH 5.5.0
    • Fix Version/s: None
    • Component/s: Oozie, Spark
    • Environment:
      Ubuntu server 14.04.3

      Description

      I hava a spark script written in pyspark and I want to submit it via oozie spark action.

      something like this:

        <action name="myapp">
            <spark xmlns="uri:oozie:spark-action:0.1">
                <job-tracker>${job_tracker}</job-tracker>
                <name-node>${name_node}</name-node>
                <master>local[*]</master>
                <name>myapp</name>
                <jar>${my_script}</jar>
                <spark-opts>--executor-memory 4G --num-executors 4</spark-opts>
                <arg>${arg1}</arg>
            </spark>
            <ok to="hive_import"/>
            <error to="send_email"/>
        </action>
      

      The script imports pyspark module:

      #!/usr/bin/spark-submit
      from pyspark import SparkContext
      from pyspark import SparkFiles
      sc = SparkContext()
      

      However, the oozie will throw the " Can not import pyspark module" exception.

      This happens when I upgrade to CDH 5.5.1 from CDH 5.4.6.

      The workaround would be using the shell action, but I think the spark action is better to describe the spark task.

      Any suggestion?

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              alec5566 Ming Hsuan Tu
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: