Details
Description
data = sc.textFile("/user/romain/bikes/201408_weather_data.csv")
data = data.map(lambda n: 1/0)
data.collect()
Prints
15/05/20 13:33:20 INFO executor.Executor: Executor is trying to kill task 1.0 in stage 35.0 (TID 64) ERROR:fake_shell:execute_reply Traceback (most recent call last): File "/tmp/2342492701486506595", line 53, in execute exec code in global_dict File "<stdin>", line 40, in <module> File "/usr/lib/spark/lib/spark-assembly-1.3.0-cdh5.5.0-SNAPSHOT-hadoop2.6.0-cdh5.5.0-SNAPSHOT.jar/pyspark/rdd.py", line 701, in collect bytesInJava = self._jrdd.collect().iterator() File "/usr/lib/spark/lib/spark-assembly-1.3.0-cdh5.5.0-SNAPSHOT-hadoop2.6.0-cdh5.5.0-SNAPSHOT.jar/py4j/java_gateway.py", line 538, in __call__ self.target_id, self.name) File "/usr/lib/spark/lib/spark-assembly-1.3.0-cdh5.5.0-SNAPSHOT-hadoop2.6.0-cdh5.5.0-SNAPSHOT.jar/py4j/protocol.py", line 300, in get_return_value format(target_id, '.', name), value) Py4JJavaError: An error occurred while calling o322.collect. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 35.0 failed 1 times, most recent failure: Lost task 0.0 in stage 35.0 (TID 63, localhost): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/usr/lib/spark/python/pyspark/worker.py", line 101, in main process() File "/usr/lib/spark/python/pyspark/worker.py", line 96, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/usr/lib/spark/python/pyspark/serializers.py", line 236, in dump_stream vs = list(itertools.islice(iterator, batch)) File "<stdin>", line 35, in clean_row File "/usr/lib/python2.7/_strptime.py", line 325, in _strptime (data_string, format)) ValueError: time data 'PDT' does not match format '%m/%d/%Y' at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:135) at org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:176) at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:94) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1191) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1191) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
Instead of more useful
5/05/20 13:37:27 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 38.0, whose tasks have all completed, from pool ERROR:fake_shell:execute_reply Traceback (most recent call last): File "/tmp/2342492701486506595", line 58, in execute exec code in global_dict File "<stdin>", line 5, in <module> File "/usr/lib/spark/lib/spark-assembly-1.3.0-cdh5.5.0-SNAPSHOT-hadoop2.6.0-cdh5.5.0-SNAPSHOT.jar/pyspark/rdd.py", line 701, in collect bytesInJava = self._jrdd.collect().iterator() File "/usr/lib/spark/lib/spark-assembly-1.3.0-cdh5.5.0-SNAPSHOT-hadoop2.6.0-cdh5.5.0-SNAPSHOT.jar/py4j/java_gateway.py", line 538, in __call__ self.target_id, self.name) File "/usr/lib/spark/lib/spark-assembly-1.3.0-cdh5.5.0-SNAPSHOT-hadoop2.6.0-cdh5.5.0-SNAPSHOT.jar/py4j/protocol.py", line 300, in get_return_value format(target_id, '.', name), value) Py4JJavaError: An error occurred while calling o347.collect. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 38.0 failed 1 times, most recent failure: Lost task 1.0 in stage 38.0 (TID 70, localhost): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/usr/lib/spark/python/pyspark/worker.py", line 101, in main process() File "/usr/lib/spark/python/pyspark/worker.py", line 96, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/usr/lib/spark/python/pyspark/serializers.py", line 236, in dump_stream vs = list(itertools.islice(iterator, batch)) File "<stdin>", line 3, in <lambda> ZeroDivisionError: integer division or modulo by zero at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:135) at org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:176) at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:94) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1191) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1191) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
Attachments
Issue Links
- relates to
-
HUE-2641 [spark] Display errors line
-
- Resolved
-