[DISTRO-541] Reading Impala-created Parquet files with MapReduce produces invalid data with unpadded literal groups - Cloudera Open Source

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: CDH4.5.0
Fix Version/s: None
Component/s: Parquet
Labels:
None

Description

The following stack trace is logged when reading certain Impala-created Parquet files with MapReduce:

parquet.io.ParquetDecodingException: Can not read value at 0 in block -1
at parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:172)
at parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:113)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:483)

The MapReduce job still completes and outputs results, but the results are not correct.

This is to have been fixed in https://github.com/Parquet/parquet-mr/pull/197 which is in parquet-mr 1.2.5, but I am seeing the issue also in 1.2.5.

Attachments

Activity

People

Assignee:

Unassigned

Reporter:

Rob Weltman

Votes:

0 Vote for this issue

Watchers:

0 Start watching this issue

Dates

Created:

07/Nov/13 2:32 AM

Updated:

15/Nov/13 11:24 PM

Resolved:

15/Nov/13 11:24 PM