Uploaded image for project: 'Kite SDK (READ-ONLY)'
  1. Kite SDK (READ-ONLY)
  2. KITE-786

Long parquet schemas cause Metastore exception during Hive import

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 0.17.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      I'm using Kite's Parquet support from Sqoop and I'm running into an issue importing a somewhat wide table (142 columns): Hive throws a metastore exception b/c the value of the serialized schema is longer than 4000 chars (which I'm assuming is the limit Postgres sets for table properties, including schema literals):

      Caused by: org.postgresql.util.PSQLException: ERROR: value too long for type character varying(4000)
      at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2102)
      at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1835)
      at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257)
      at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:500)
      at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:388)
      at org.postgresql.jdbc2.AbstractJdbc2Statement.executeUpdate(AbstractJdbc2Statement.java:334)
      at com.jolbox.bonecp.PreparedStatementHandle.executeUpdate(PreparedStatementHandle.java:205)
      at org.datanucleus.store.rdbms.ParamLoggingPreparedStatement.executeUpdate(ParamLoggingPreparedStatement.java:399)
      at org.datanucleus.store.rdbms.SQLController.executeStatementUpdate(SQLController.java:439)
      at org.datanucleus.store.rdbms.scostore.JoinMapStore.internalPut(JoinMapStore.java:1069)
      ... 40 more
      )
      at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_with_environment_context_result$create_table_with_environment_context_resultStandardScheme.read(ThriftHiveMetastore.java:24255)
      at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_with_environment_context_result$create_table_with_environment_context_resultStandardScheme.read(ThriftHiveMetastore.java:24223)
      at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_with_environment_context_result.read(ThriftHiveMetastore.java:24149)
      at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
      at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_create_table_with_environment_context(ThriftHiveMetastore.java:893)
      at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.create_table_with_environment_context(ThriftHiveMetastore.java:879)
      at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:569)
      at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:558)
      at org.kitesdk.data.spi.hive.MetaStoreUtil$4.call(MetaStoreUtil.java:179)
      at org.kitesdk.data.spi.hive.MetaStoreUtil$4.call(MetaStoreUtil.java:176)
      at org.kitesdk.data.spi.hive.MetaStoreUtil.doWithRetry(MetaStoreUtil.java:66)
      at org.kitesdk.data.spi.hive.MetaStoreUtil.createTable(MetaStoreUtil.java:191)

      I would think the right solution here would be to have an option for writing large schemas to a file in HDFS that could be referenced from the Hive metastore via a URL.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                jwills Josh Wills
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: