Uploaded image for project: 'Kite SDK'
  1. Kite SDK
  2. KITE-536

AvroSerialization.setDataModelClass is not available on CDH4

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.15.0
    • Fix Version/s: 0.15.0
    • Component/s: Data Module
    • Labels:
      None

      Description

      I've updated the demo example to no longer use the partition API, but I'm getting a failure when configuring Avro:

      2014-07-10 15:38:44,817 FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.NoSuchMethodError: org.apache.avro.hadoop.io.AvroSerialization.setDataModelClass(Lorg/apache/hadoop/conf/Configuration;Ljava/lang/Class;)V
      	at org.kitesdk.data.spi.filesystem.FileSystemViewKeyInputFormat.<init>(FileSystemViewKeyInputFormat.java:61)
      	at org.kitesdk.data.spi.filesystem.FileSystemViewKeyInputFormat.<init>(FileSystemViewKeyInputFormat.java:67)
      	at org.kitesdk.data.spi.filesystem.FileSystemView.getInputFormat(FileSystemView.java:117)
      	at org.kitesdk.data.mapreduce.DatasetKeyInputFormat.getDelegateInputFormat(DatasetKeyInputFormat.java:233)
      	at org.kitesdk.data.mapreduce.DatasetKeyInputFormat.getDelegateInputFormatForView(DatasetKeyInputFormat.java:263)
      	at org.kitesdk.data.mapreduce.DatasetKeyInputFormat.setConf(DatasetKeyInputFormat.java:224)
      	at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
      	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
      	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:635)
      	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
      	at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:396)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
      	at org.apache.hadoop.mapred.Child.main(Child.java:262)
      

      The setDataModelClass is not available in 1.7.4 (CDH4). Adding 1.7.5 to the dependencies doesn't fix the problem, even though it is added to the distributed cache. This is probably because 1.7.4 is found first in the ClassLoader chain.

      I think the solution is to dynamically call setDataModelClass if it is available and ignore it if it is not. 1.7.4 uses a reflect reader by default, so the avro behavior will be controlled by whether or not the class is present on the worker nodes.

      This should be called out in the release notes because the new support for data models will not work with CDH4 clusters.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                blue Ryan Blue
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: