Uploaded image for project: 'Kite SDK (READ-ONLY)'
  1. Kite SDK (READ-ONLY)
  2. KITE-1090

DatasetKeyInputFormat not honoring job config regression

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.2.0
    • Fix Version/s: 1.2.0
    • Component/s: Data Module
    • Labels:
      None

      Description

      In doing some testing to validate I could remove a workaround now that KITE-976 was fixed I started getting a similar error but it looks like the stack trace is showing a different code path in the current master branch than what was previously reported...

      2015-11-12 08:32:49,608 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.IllegalArgumentException: java.net.UnknownHostException: fakedev
      	at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:374)
      	at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:312)
      	at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:178)
      	at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:664)
      	at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:608)
      	at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:148)
      	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2596)
      	at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
      	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)
      	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)
      	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
      	at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
      	at org.kitesdk.data.spi.filesystem.FileSystemDataset$Builder.build(FileSystemDataset.java:689)
      	at org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.load(FileSystemDatasetRepository.java:199)
      	at org.kitesdk.data.Datasets.load(Datasets.java:108)
      	at org.kitesdk.data.Datasets.load(Datasets.java:165)
      	at org.kitesdk.data.mapreduce.DatasetKeyInputFormat.load(DatasetKeyInputFormat.java:305)
      	at org.kitesdk.data.mapreduce.DatasetKeyInputFormat.setConf(DatasetKeyInputFormat.java:241)
      	at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
      	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
      	at org.apache.crunch.impl.mr.run.CrunchRecordReader.initNextRecordReader(CrunchRecordReader.java:70)
      	at org.apache.crunch.impl.mr.run.CrunchRecordReader.<init>(CrunchRecordReader.java:49)
      	at org.apache.crunch.impl.mr.run.CrunchInputFormat.createRecordReader(CrunchInputFormat.java:77)
      	at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:515)
      	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:758)
      	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
      	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:422)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
      	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
      Caused by: java.net.UnknownHostException: fakedev
      	... 31 more
      

      The key thing to note is that it looks like the calls are coming through Crunch classes instead of through the DatasetKeyInputFormat class. So this is likely not calling the "createRecordReader(...)"[1] method that needs to be called for this to be fixed.

      I'm supplying a supplementary config file when launching like the following and viewing the running job I can see it is all part of the jobs configuration.

      <configuration>
      <property><name>dfs.nameservices</name><value>ingestiondev,fakedev</value></property>
      <property><name>dfs.client.failover.proxy.provider.fakedev</name><value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value></property>
      <property><name>dfs.namenode.servicerpc-address.fakedev.namenode831</name><value>host2net:8022</value></property>
      <property><name>dfs.namenode.servicerpc-address.fakedev.namenode864</name><value>host1.net:8022</value></property>
      <property><name>dfs.namenode.rpc-address.fakedev.namenode864</name><value>host1.net:8020</value></property>
      <property><name>dfs.namenode.rpc-address.fakedev.namenode831</name><value>host2.net:8020</value></property>
      <property><name>dfs.ha.namenodes.fakedev</name><value>namenode864,namenode831</value></property>
      </configuration>
      

      [1] - https://github.com/kite-sdk/kite/pull/368/files

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                mkwhitacre Micah Whitacre
                Reporter:
                mkwhitacre Micah Whitacre
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: