Uploaded image for project: 'Kite SDK (READ-ONLY)'
  1. Kite SDK (READ-ONLY)
  2. KITE-762

Multiple URIs in hive.metastore.uris configuration may be problematic for Crunch+Kite

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.17.0
    • Fix Version/s: None
    • Component/s: Data Module
    • Labels:
      None

      Description

      We have a Crunch job, periodically run on YARN through Oozie, that calculates some stats for a Kite dataset that's setup as a Hive external table.

      In one environment, everything works correctly. The job config as recorded by the JobHistory server looks like this:

      hive.metastore.uris=thrift://server1.abc.net:9083
      kite.inputPartitionDir=hdfs://ingestiondev/wolfe/storage
      kite.inputUri=dataset:hive://server1.abc.net:9083/wolfe/default/storage?hdfs:host=ingestiondev
      

      In another similar environment the job is failing with this map task exception:

      2014-11-04 18:36:09,402 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.IllegalArgumentException: Missing Hive MetaStore connection URI
      	at org.kitesdk.data.spi.hive.MetaStoreUtil.<init>(MetaStoreUtil.java:78)
      	at org.kitesdk.data.spi.hive.HiveAbstractMetadataProvider.getMetaStoreUtil(HiveAbstractMetadataProvider.java:56)
      	at org.kitesdk.data.spi.hive.HiveAbstractMetadataProvider.resolveNamespace(HiveAbstractMetadataProvider.java:237)
      	at org.kitesdk.data.spi.hive.HiveAbstractMetadataProvider.resolveNamespace(HiveAbstractMetadataProvider.java:222)
      	at org.kitesdk.data.spi.hive.HiveAbstractMetadataProvider.load(HiveAbstractMetadataProvider.java:95)
      	at org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.load(FileSystemDatasetRepository.java:191)
      	at org.kitesdk.data.Datasets.load(Datasets.java:69)
      	at org.kitesdk.data.Datasets.load(Datasets.java:113)
      	at org.kitesdk.data.mapreduce.DatasetKeyInputFormat.load(DatasetKeyInputFormat.java:226)
      	at org.kitesdk.data.mapreduce.DatasetKeyInputFormat.setConf(DatasetKeyInputFormat.java:172)
      	at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
      	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
      	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:726)
      	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
      	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:415)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
      	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
      

      The config for the failing job looks like this:

      hive.metastore.uris=thrift://server1.xyz.net:9083,thrift://server2.xyz.net:9083
      kite.inputPartitionDir=hdfs://wario/wolfe/default/storage
      kite.inputUri=dataset:hive:/wolfe/default/storage?hdfs:host=wario
      

      I haven't tracked down how the kite.inputUri property is constructed, but it seems odd that it contains the metastore host:port only for the successful job. I think the key difference is likely the multiple URIs in the hive.metastore.uris property for the unsuccessful job. A quick search found some Kite code that doesn't appear to handle multiple URIs correctly [1] (not sure if this is ultimately the culprit for the issue we're seeing, but it does look like a bug).

      We're using CDH 5.1.0.1 and Kite 0.17.0.

      [1] https://github.com/kite-sdk/kite/blob/release-0.17.0/kite-data/kite-data-hive/src/main/java/org/kitesdk/data/spi/hive/HiveAbstractDatasetRepository.java#L88-94

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                vasas Szabolcs Vasas
                Reporter:
                noslowerdna Andrew Olson
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: