Details
-
Type: Bug
-
Status: Open
-
Priority: Major
-
Resolution: Unresolved
-
Affects Version/s: CDH 5.0.2
-
Fix Version/s: None
-
Component/s: Hive
-
Labels:None
Description
The issue can be recreated with following steps
1) In hbase
create 'TABLE_EMP','default'
2) On hive
sudo -u hive hive
CREATE EXTERNAL TABLE TABLE_EMP(FIRST_NAME string,LAST_NAME string,CDS_UPDATED_DATE string,CDS_PK string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES("hbase.columns.mapping" = "default:FIRST_NAME,default:LAST_NAME,default:CDS_UPDATED_DATE,:key", "hbase.scan.cache" = "500", "hbase.scan.cacheblocks" = "false" ) TBLPROPERTIES("hbase.table.name" = "TABLE_EMP",'serialization.null.format'='');
3) On hbase insert the following data
put 'TABLE_EMP', '1', 'default:FIRST_NAME', 'Srini'
put 'TABLE_EMP', '1', 'default:LAST_NAME', 'P'
put 'TABLE_EMP', '1', 'default:CDS_UPDATED_DATE', '2014-06-16 00:00:00'
put 'TABLE_EMP', '2', 'default:FIRST_NAME', 'Aravind'
put 'TABLE_EMP', '2', 'default:LAST_NAME', 'K'
put 'TABLE_EMP', '2', 'default:CDS_UPDATED_DATE', '2014-06-16 00:00:00'
4) On hive execute the following query
hive
SELECT *
FROM (
SELECT CDS_PK
FROM TABLE_EMP
WHERE
CDS_PK >= '0'
AND CDS_PK <= '9'
AND CDS_UPDATED_DATE IS NOT NULL
UNION ALL SELECT CDS_PK
FROM TABLE_EMP
WHERE
CDS_PK >= 'a'
AND CDS_PK <= 'z'
AND CDS_UPDATED_DATE IS NOT NULL
)t ;
5) Output of the query
1
1
2
2
6) Output of just
SELECT CDS_PK
FROM TABLE_EMP
WHERE
CDS_PK >= '0'
AND CDS_PK <= '9'
AND CDS_UPDATED_DATE IS NOT NULL
is
1
2
7) Output of just
SELECT CDS_PK
FROM TABLE_EMP
WHERE
CDS_PK >= 'a'
AND CDS_PK <= 'z'
AND CDS_UPDATED_DATE IS NOT NULL
Empty
8) UNION is used to combine the result from multiple SELECT statements into a single result set. Hive currently only supports UNION ALL (bag union), in which duplicates are not eliminated
Accordingly above query should return output
1
2
instead it is giving wrong output
1
1
2
2
Current work-around for the problem is to use DISTINCT in one of the sub queries.
Have created an ASF upstream hive jira HIVE-7248 for the same.