Details
Description
If you try to select a column with a NULL into an ORC table, it fails with:
java.lang.IllegalArgumentException: Bad primitive category VOID
This is similar to the Apache https://issues.apache.org/jira/browse/HIVE-8470, which was closed without a code change, but it still fails with a slightly more complex query than the one given there. To reproduce:
hive> create table has_null (c string) stored as textfile;
hive> insert overwrite table has_null select null from dual;
hive> create table target (c string) stored as orc;
hive> insert overwrite table target select * from has_null where c is null;
(note: "dual" is a single-record table containing a non-NULL value, thus has_null contains a single record with a NULL value).
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:179)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:507)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170)
... 8 more
Caused by: java.lang.IllegalArgumentException: Bad primitive category VOID
at org.apache.hadoop.hive.ql.io.orc.WriterImpl.createTreeWriter(WriterImpl.java:1843)
at org.apache.hadoop.hive.ql.io.orc.WriterImpl.access$1500(WriterImpl.java:97)
at org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.<init>(WriterImpl.java:1593)
at org.apache.hadoop.hive.ql.io.orc.WriterImpl.createTreeWriter(WriterImpl.java:1847)
at org.apache.hadoop.hive.ql.io.orc.WriterImpl.<init>(WriterImpl.java:194)
at org.apache.hadoop.hive.ql.io.orc.OrcFile.createWriter(OrcFile.java:435)
at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.write(OrcOutputFormat.java:84)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:695)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at org.apache.hadoop.hive.ql.exec.FilterOperator.processOp(FilterOperator.java:120)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:497)
... 9 more
The trigger seems to be that the column containing a NULL is explicitly tested in the WHERE clause. Without the WHERE clause this succeeds. The storage format of the source table makes no difference.