[DISTRO-543] Pig fails to read Parquet file with a complex field if schema not specified explicitly - Cloudera Open Source

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: CDH4.5.0
Fix Version/s: None
Component/s: Parquet
Labels:
None

Description

Table created with Hive:

CREATE TABLE IF NOT EXISTS complextable (
ind int,
map_col map<int,string>,
struct_col struct<key:int,value:string>,
array_col array<int>
)
ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat'
OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat';

Reading with Pig:

parquet.pig.TupleConversionException: error while preparing converter for:
map_col: tuple({bag_0: {map: (key: int,value: bytearray)}})
optional group map_col {
}
at parquet.pig.convert.TupleConverter.newConverter(TupleConverter.java:128)
at parquet.pig.convert.TupleConverter.<init>(TupleConverter.java:77)
at parquet.pig.convert.TupleRecordMaterializer.<init>(TupleRecordMaterializer.java:30)
at parquet.pig.TupleReadSupport.prepareForRead(TupleReadSupport.java:166)
...
Caused by: java.lang.NullPointerException
at parquet.pig.TupleReadSupport.getPigSchemaFromFile(TupleReadSupport.java:80)
at parquet.pig.TupleReadSupport.prepareForRead(TupleReadSupport.java:107)
at parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:137)

The parquet-mr github site (https://github.com/parquet/parquet-mr) says the following about reading with Pig:

"If the data was stored using another method, you will need to provide the Pig schema equivalent to the data you stored (you can also write the schema to the file footer while writing it – but that's pretty advanced). We will provide a basic automatic schema conversion soon."

Attachments

Activity

People

Assignee:

Unassigned

Reporter:

Rob Weltman

Votes:

0 Vote for this issue

Watchers:

0 Start watching this issue

Dates

Created:

07/Nov/13 2:45 AM

Updated:

07/Nov/13 2:50 AM