Uploaded image for project: 'CDH (READ-ONLY)'
  1. CDH (READ-ONLY)
  2. DISTRO-645

Issue with spark-shell / spark-submit in yarn cluster mode

    Details

    • Type: Bug
    • Status: Open
    • Priority: Blocker
    • Resolution: Unresolved
    • Affects Version/s: CDH 5.1.2, cdh 5.1.3
    • Fix Version/s: None
    • Component/s: Spark
    • Environment:
      Debian 7 CDM 5.1.X CDH Parcels installation

      Description

      Hello,

      I've just update to CDH5.1.2 (and then CDH5.1.3) and I've got trouble with Spark. I have used parcel installation to install Spark on the cluster. When I'm running a simple wordcount as described in the documentation, I got the following errors :

          Spark Executor Command: "/usr/lib/jvm/java-7-oracle-cloudera/bin/java" "-cp" "::/opt/cloudera/parcels/CDH-5.1.3-1.cdh5.1.3.p0.12/lib/spark/conf:/opt/cloudera/parcels/CDH-5.1.3-1.cdh5.1.3.p0.12/lib/spark/assembly/lib/*:/opt/cloudera/parcels/CDH-5.1.3-1.cdh5.1.3.p0.12/lib/spark/examples/lib/*:/etc/hadoop/conf:/etc/hadoop/conf:/opt/cloudera/parcels/CDH-5.1.3-1.cdh5.1.3.p0.12/lib/hadoop/libexec/../../hadoop/lib/*:/opt/cloudera/parcels/CDH-5.1.3-1.cdh5.1.3.p0.12/lib/hadoop/libexec/../../hadoop/.//*:/opt/cloudera/parcels/CDH-5.1.3-1.cdh5.1.3.p0.12/lib/hadoop/../hadoop-hdfs/./:/opt/cloudera/parcels/CDH-5.1.3-1.cdh5.1.3.p0.12/lib/hadoop/../hadoop-hdfs/lib/*:/opt/cloudera/parcels/CDH-5.1.3-1.cdh5.1.3.p0.12/lib/hadoop/../hadoop-hdfs/.//*:/opt/cloudera/parcels/CDH-5.1.3-1.cdh5.1.3.p0.12/lib/hadoop/../hadoop-yarn/lib/*:/opt/cloudera/parcels/CDH-5.1.3-1.cdh5.1.3.p0.12/lib/hadoop/../hadoop-yarn/.//*:/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/./:/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/lib/*:/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/.//*:/opt/cloudera/parcels/CDH-5.1.3-1.cdh5.1.3.p0.12/lib/spark/lib/scala-library.jar:/opt/cloudera/parcels/CDH-5.1.3-1.cdh5.1.3.p0.12/lib/spark/lib/scala-compiler.jar:/opt/cloudera/parcels/CDH-5.1.3-1.cdh5.1.3.p0.12/lib/spark/lib/jline.jar" "-XX:MaxPermSize=128m" "-Xms512M" "-Xmx512M" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "akka.tcp://spark@master2.stg.ps:38649/user/CoarseGrainedScheduler" "0" "slave01.stg.ps" "8" "akka.tcp://sparkWorker@slave01.stg.ps:7078/user/Worker" "8" "app-20140924094454-0002"
          ========================================
      
          14/09/24 09:44:55 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
          14/09/24 09:44:55 INFO SecurityManager: Changing view acls to: spark,root
          14/09/24 09:44:55 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(spark, root)
          14/09/24 09:44:55 INFO Slf4jLogger: Slf4jLogger started
          14/09/24 09:44:55 INFO Remoting: Starting remoting
          14/09/24 09:44:55 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkExecutor@slave01.stg.ps:36088]
          14/09/24 09:44:55 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkExecutor@slave01.stg.ps:36088]
          14/09/24 09:44:55 INFO CoarseGrainedExecutorBackend: Connecting to driver: akka.tcp://spark@master2.stg.ps:38649/user/CoarseGrainedScheduler
          14/09/24 09:44:55 INFO WorkerWatcher: Connecting to worker akka.tcp://sparkWorker@slave01.stg.ps:7078/user/Worker
          14/09/24 09:44:56 INFO WorkerWatcher: Successfully connected to akka.tcp://sparkWorker@slave01.stg.ps:7078/user/Worker
          14/09/24 09:44:56 INFO CoarseGrainedExecutorBackend: Successfully registered with driver
          14/09/24 09:44:56 INFO SecurityManager: Changing view acls to: spark,root
          14/09/24 09:44:56 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(spark, root)
          14/09/24 09:44:56 INFO Slf4jLogger: Slf4jLogger started
          14/09/24 09:44:56 INFO Remoting: Starting remoting
          14/09/24 09:44:56 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://spark@slave01.stg.ps:46489]
          14/09/24 09:44:56 INFO Remoting: Remoting now listens on addresses: [akka.tcp://spark@slave01.stg.ps:46489]
          14/09/24 09:44:56 INFO SparkEnv: Connecting to MapOutputTracker: akka.tcp://spark@master2.stg.ps:38649/user/MapOutputTracker
          14/09/24 09:44:56 INFO SparkEnv: Connecting to BlockManagerMaster: akka.tcp://spark@master2.stg.ps:38649/user/BlockManagerMaster
          14/09/24 09:44:56 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20140924094456-12e0
          14/09/24 09:44:56 INFO MemoryStore: MemoryStore started with capacity 294.9 MB.
          14/09/24 09:44:56 INFO ConnectionManager: Bound socket to port 55188 with id = ConnectionManagerId(slave01.stg.ps,55188)
          14/09/24 09:44:56 INFO BlockManagerMaster: Trying to register BlockManager
          14/09/24 09:44:56 INFO BlockManagerMaster: Registered BlockManager
          14/09/24 09:44:56 INFO HttpFileServer: HTTP File server directory is /tmp/spark-66e286b1-c392-4525-a999-78c7690a91c8
          14/09/24 09:44:56 INFO HttpServer: Starting HTTP Server
          14/09/24 09:44:56 INFO Executor: Using REPL class URI: http://master2.stg.ps:50248
          14/09/24 09:45:28 INFO CoarseGrainedExecutorBackend: Got assigned task 2
          14/09/24 09:45:28 INFO Executor: Running task ID 2
          14/09/24 09:45:28 INFO CoarseGrainedExecutorBackend: Got assigned task 5
          14/09/24 09:45:28 INFO Executor: Running task ID 5
          14/09/24 09:45:28 INFO HttpBroadcast: Started reading broadcast variable 0
          14/09/24 09:45:28 INFO MemoryStore: ensureFreeSpace(330072) called with curMem=0, maxMem=309225062
          14/09/24 09:45:28 INFO MemoryStore: Block broadcast_0 stored as values to memory (estimated size 322.3 KB, free 294.6 MB)
          14/09/24 09:45:28 INFO HttpBroadcast: Reading broadcast variable 0 took 0.171247662 s
          14/09/24 09:45:28 INFO BlockManager: Found block broadcast_0 locally
          14/09/24 09:45:28 ERROR Executor: Exception in task ID 2
          java.io.EOFException
          	at java.io.ObjectInputStream$BlockDataInputStream.readFully(ObjectInputStream.java:2744)
          	at java.io.ObjectInputStream.readFully(ObjectInputStream.java:1032)
          	at org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:68)
          	at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:106)
          	at org.apache.hadoop.io.UTF8.readChars(UTF8.java:260)
          	at org.apache.hadoop.io.UTF8.readString(UTF8.java:252)
          	at org.apache.hadoop.mapred.FileSplit.readFields(FileSplit.java:87)
          	at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:285)
          	at org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:77)
          	at org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:42)
          	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
          	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          	at java.lang.reflect.Method.invoke(Method.java:606)
          	at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
          	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
          	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
          	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
          	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
          	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
          	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
          	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
          	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
          	at org.apache.spark.scheduler.ShuffleMapTask.readExternal(ShuffleMapTask.scala:140)
          	at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1837)
          	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
          	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
          	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
          	at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:63)
          	at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:85)
          	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:169)
          	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
          	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
          	at java.lang.Thread.run(Thread.java:745)
          14/09/24 09:45:28 ERROR Executor: Exception in task ID 5
          java.io.EOFException
          	at java.io.ObjectInputStream$BlockDataInputStream.readFully(ObjectInputStream.java:2744)
          	at java.io.ObjectInputStream.readFully(ObjectInputStream.java:1032)
          	at org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:68)
          	at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:106)
          	at org.apache.hadoop.io.UTF8.readChars(UTF8.java:260)
          	at org.apache.hadoop.io.UTF8.readString(UTF8.java:252)
          	at org.apache.hadoop.mapred.FileSplit.readFields(FileSplit.java:87)
          	at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:285)
          	at org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:77)
          	at org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:42)
          	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
          	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          	at java.lang.reflect.Method.invoke(Method.java:606)
          	at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
          	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
          	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
          	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
          	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
          	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
          	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
          	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
          	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
          	at org.apache.spark.scheduler.ShuffleMapTask.readExternal(ShuffleMapTask.scala:140)
          	at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1837)
          	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
          	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
          	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
          	at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:63)
          	at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:85)
          	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:169)
          	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
          	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
          	at java.lang.Thread.run(Thread.java:745)
      

      Does anyone have encounter this issue ?

      I don't know if this can help but when I'm starting spark-shell, it goes like this:

          14/09/24 10:00:15 INFO SecurityManager: Changing view acls to: root
          14/09/24 10:00:15 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root)
          14/09/24 10:00:15 INFO HttpServer: Starting HTTP Server
          Welcome to
                ____              __
               / __/__  ___ _____/ /__
              _\ \/ _ \/ _ `/ __/  '_/
             /___/ .__/\_,_/_/ /_/\_\   version 1.0.0
                /_/
      
          Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_55)
          Type in expressions to have them evaluated.
          Type :help for more information.
          14/09/24 10:00:18 INFO SecurityManager: Changing view acls to: root
          14/09/24 10:00:18 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root)
          14/09/24 10:00:18 INFO Slf4jLogger: Slf4jLogger started
          14/09/24 10:00:18 INFO Remoting: Starting remoting
          14/09/24 10:00:18 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://spark@master2.stg.ps:39453]
          14/09/24 10:00:18 INFO Remoting: Remoting now listens on addresses: [akka.tcp://spark@master2.stg.ps:39453]
          14/09/24 10:00:18 INFO SparkEnv: Registering MapOutputTracker
          14/09/24 10:00:18 INFO SparkEnv: Registering BlockManagerMaster
          14/09/24 10:00:18 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20140924100018-86d1
          14/09/24 10:00:18 INFO MemoryStore: MemoryStore started with capacity 294.9 MB.
          14/09/24 10:00:18 INFO ConnectionManager: Bound socket to port 44309 with id = ConnectionManagerId(master2.stg.ps,44309)
          14/09/24 10:00:18 INFO BlockManagerMaster: Trying to register BlockManager
          14/09/24 10:00:18 INFO BlockManagerInfo: Registering block manager master2.stg.ps:44309 with 294.9 MB RAM
          14/09/24 10:00:18 INFO BlockManagerMaster: Registered BlockManager
          14/09/24 10:00:18 INFO HttpServer: Starting HTTP Server
          14/09/24 10:00:18 INFO HttpBroadcast: Broadcast server started at http://master2.stg.ps:53222
          14/09/24 10:00:18 INFO HttpFileServer: HTTP File server directory is /tmp/spark-d3e9000f-9f93-4414-a423-39a08337ea95
          14/09/24 10:00:18 INFO HttpServer: Starting HTTP Server
          14/09/24 10:00:18 INFO SparkUI: Started SparkUI at http://master2.stg.ps:4040
          14/09/24 10:00:18 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
          14/09/24 10:00:19 INFO EventLoggingListener: Logging events to /user/spark/applicationHistory/spark-shell-1411545618959
          14/09/24 10:00:19 INFO AppClient$ClientActor: Connecting to master spark://master2.stg.ps:7077...
          14/09/24 10:00:19 INFO SparkILoop: Created spark context..
          14/09/24 10:00:19 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20140924100019-0003
          14/09/24 10:00:19 INFO AppClient$ClientActor: Executor added: app-20140924100019-0003/0 on worker-20140924093200-slave01.stg.ps-7078 (slave01.stg.ps:7078) with 8 cores
          14/09/24 10:00:19 INFO SparkDeploySchedulerBackend: Granted executor ID app-20140924100019-0003/0 on hostPort slave01.stg.ps:7078 with 8 cores, 512.0 MB RAM
          14/09/24 10:00:19 INFO AppClient$ClientActor: Executor added: app-20140924100019-0003/1 on worker-20140924093214-slave02.stg.ps-7078 (slave02.stg.ps:7078) with 8 cores
          14/09/24 10:00:19 INFO SparkDeploySchedulerBackend: Granted executor ID app-20140924100019-0003/1 on hostPort slave02.stg.ps:7078 with 8 cores, 512.0 MB RAM
          14/09/24 10:00:19 INFO AppClient$ClientActor: Executor added: app-20140924100019-0003/2 on worker-20140924093159-slave03.stg.ps-7078 (slave03.stg.ps:7078) with 8 cores
          14/09/24 10:00:19 INFO SparkDeploySchedulerBackend: Granted executor ID app-20140924100019-0003/2 on hostPort slave03.stg.ps:7078 with 8 cores, 512.0 MB RAM
          14/09/24 10:00:19 INFO AppClient$ClientActor: Executor updated: app-20140924100019-0003/2 is now RUNNING
          14/09/24 10:00:19 INFO AppClient$ClientActor: Executor updated: app-20140924100019-0003/1 is now RUNNING
          14/09/24 10:00:19 INFO AppClient$ClientActor: Executor updated: app-20140924100019-0003/0 is now RUNNING
          Spark context available as sc.
      
          scala> 14/09/24 10:00:20 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@slave02.stg.ps:49723/user/Executor#65841499] with ID 1
          14/09/24 10:00:21 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@slave01.stg.ps:35921/user/Executor#1881060459] with ID 0
          14/09/24 10:00:21 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@slave03.stg.ps:39931/user/Executor#-523951647] with ID 2
          14/09/24 10:00:21 INFO BlockManagerInfo: Registering block manager slave02.stg.ps:34499 with 294.9 MB RAM
          14/09/24 10:00:21 INFO BlockManagerInfo: Registering block manager slave03.stg.ps:58294 with 294.9 MB RAM
          14/09/24 10:00:21 INFO BlockManagerInfo: Registering block manager slave01.stg.ps:46946 with 294.9 MB RAM
      

      The same wordcount was working in CDH5.1.0.

      I have tried a wordcount via PySpark and it works well in cluster mode. So I think there's an issue with spark-shell / spark-submit for the Scala version since CDH5.1.2 & CDH5.1.3 parcels.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              nphung Nicolas Phung
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: