Details
-
Type:
Bug
-
Status: Open
-
Priority:
Major
-
Resolution: Unresolved
-
Affects Version/s: CDH4.4.0
-
Fix Version/s: None
-
Component/s: MapReduce
-
Labels:None
-
Environment:Centos 6
Description
We recently upgraded from CDH4.2.0 to CDH4.4.0 and everything appeared fine except when we attempted to run a hive query against a location with large, snappy compressed files. We found that instead of launching a mapper per block, it launches one mapper (see example of forwarding below). As a result, a query that used to take 2 minutes doesn't complete after over 45 minutes. We did not change any configuration that we are aware of. Note that I found that if I ran the hive included with CDH4.4 but set HADOOP_HOME to the older CDH4.2.0 binaries, the expected behavior of a mapper for block occurred. Any insight would be GREATLY appreciated!
2013-11-07 13:58:02,102 INFO org.apache.hadoop.hive.ql.exec.MapOperator: Processing alias organic_events for file hdfs://nameservice1/events/organic/2013/11/05
2013-11-07 13:58:02,102 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 7 forwarding 1 rows
2013-11-07 13:58:02,102 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 forwarding 1 rows
2013-11-07 13:58:02,102 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 1 forwarding 1 rows
2013-11-07 13:58:02,103 INFO ExecMapper: ExecMapper: processing 1 rows: used memory = 178407888
2013-11-07 13:58:02,104 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 7 forwarding 10 rows
2013-11-07 13:58:02,104 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 forwarding 10 rows
2013-11-07 13:58:02,104 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 1 forwarding 10 rows
2013-11-07 13:58:02,104 INFO ExecMapper: ExecMapper: processing 10 rows: used memory = 178407888