Uploaded image for project: 'CDH (READ-ONLY)'
  1. CDH (READ-ONLY)
  2. DISTRO-807

Hive fails to write parquet column type array<struct<>>

    Details

    • Type: Bug
    • Status: Open
    • Priority: Blocker
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Hive
    • Labels:
      None
    • Environment:
      CDH 5.7.0 @ CentOS 6

      Description

      We upgraded from CDH 5.4.4 to 5.7.0 and queries such as this one now fail:

      create table hivebug.simple ( `mycol` array<struct<clickid:int,issourceloss:boolean,transitiontype:string>> ) STORED AS PARQUET;
      insert into hivebug.simple select mycol from data.dump limit 1;
      

      We can select the data just fine but writing it is the issue. The reducer fails with an exception like this:

      java.lang.NegativeArraySizeException
      	at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryArray.adjustArraySize(LazyBinaryArray.java:108)
      	at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryArray.parse(LazyBinaryArray.java:136)
      	at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryArray.getListLength(LazyBinaryArray.java:210)
      	at org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryListObjectInspector.getListLength(LazyBinaryListObjectInspector.java:63)
      	at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$ListDataWriter.write(DataWritableWriter.java:259)
      	at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$GroupDataWriter.write(DataWritableWriter.java:199)
      	at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$MessageDataWriter.write(DataWritableWriter.java:215)
      	at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:88)
      	at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:59)
      	at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:31)
      	at parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:116)
      	at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:123)
      	at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:42)
      	at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:111)
      	at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:124)
      	at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:697)
      	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
      	at org.apache.hadoop.hive.ql.exec.LimitOperator.processOp(LimitOperator.java:51)
      	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
      	at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
      	at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244)
      

      or if you run

      create table test_xy stored as parquet as SELECT * FROM data.dump WHERE year = "2016" and month = "03" and day in ("12", "13", "14") and mycol in("abc", "def", "ghi") order by timestamp asc
      

      you get slightly different one:

      java.lang.IllegalArgumentException: fromIndex(0) > toIndex(-1)
      	at java.util.Arrays.rangeCheck(Arrays.java:794)
      	at java.util.Arrays.fill(Arrays.java:2084)
      	at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryArray.parse(LazyBinaryArray.java:163)
      	at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryArray.getListLength(LazyBinaryArray.java:210)
      	at org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryListObjectInspector.getListLength(LazyBinaryListObjectInspector.java:63)
      	at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$ListDataWriter.write(DataWritableWriter.java:259)
      	at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$GroupDataWriter.write(DataWritableWriter.java:199)
      	at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$MessageDataWriter.write(DataWritableWriter.java:215)
      	at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:88)
      	at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:59)
      	at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:31)
      	at parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:116)
      	at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:123)
      	at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:42)
      	at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:111)
      	at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:124)
      	at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:697)
      	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
      	at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
      	at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244)
      

      Btw just before the crash I can see a suspicious info message in logs:

      2016-05-27 12:37:54,197 INFO [main] org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct: Missing fields! Expected 8 fields but only got 8! Ignoring similar problems.
      

      I looked at a diff between hive 1.1.0-cdh5.4.4 and 1.1.0-cdh5.7.0 to see what has changed and I noticed huge parquet-related changes (ql/src/java/org/apache/hadoop/hive/ql/io/parquet) so I think this is where the bug comes from.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              dwatzke David Watzke
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated: