Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: CDH 5.7.0, CDH 5.7.1
    • Fix Version/s: None
    • Component/s: Spark
    • Labels:
      None

      Description

      I'm using cdh 5.7.1 and am trying to use a Kafka DStream from spark-streaming-kafka_2.10 in my application, but am seeing an exception about a Kafka class not being found:

      java.lang.NoClassDefFoundError: kafka/cluster/BrokerEndPoint
      at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$2$$anonfun$3$$anonfun$apply$6$$anonfun$apply$7.apply(KafkaCluster.scala:90)
      ~[spark-assembly.jar:na]
      at scala.Option.map(Option.scala:145) ~[spark-assembly.jar:na]
      at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$2$$anonfun$3$$anonfun$apply$6.apply(KafkaCluster.scala:90)
      ~[spark-assembly.jar:na]
      at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$2$$anonfun$3$$anonfun$apply$6.apply(KafkaCluster.scala:87)
      ~[spark-assembly.jar:na]
      at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
      ~[spark-assembly.jar:na]
      at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
      ~[spark-assembly.jar:na]
      at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
      ~[spark-assembly.jar:na]
      at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
      ~[spark-assembly.jar:na]
      at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
      ~[spark-assembly.jar:na]
      at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
      ~[spark-assembly.jar:na]
      at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$2$$anonfun$3.apply(KafkaCluster.scala:87)
      ~[spark-assembly.jar:na]
      at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$2$$anonfun$3.apply(KafkaCluster.scala:86)
      ~[spark-assembly.jar:na]
      at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
      ~[spark-assembly.jar:na]
      at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
      ~[spark-assembly.jar:na]
      at scala.collection.immutable.Set$Set1.foreach(Set.scala:74)
      ~[spark-assembly.jar:na]

      On closer inspection, it looks like the spark-assembly jar in cdh5.7 contains the KafkaUtils class and all other Kafka streaming classes, but it does not contain the kafka classes needed. Was this intentionally done? It seems off that the assembly jar should contain this library (I see the flume and twitter streams in there as well). But if it does, shouldn't it contain the dependencies
      as well?

      This is causing me additional trouble because I need to run my app on different versions of cdh, and older versions of spark-assembly don't include the Kafka streaming classes. So on one environment I would need a jar that has kafka-streaming as a compile scope dependency, but for 5.7 I would need one that has it as provided scope, but also have kafka as a compile scope dependency.

      On a separate note, the line numbers in the exception don't seem to match up with whats in the github repository. KafkaCluster (https://github.com/cloudera/spark/blob/cdh5.7.1-release/external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaCluster.scala) doesn't even use BrokerEndpoint. Is something else being packaged in cdh?

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              ashau Albert Shau
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: