Details
-
Type:
Bug
-
Status: Open
-
Priority:
Major
-
Resolution: Unresolved
-
Affects Version/s: CDH4.5.0
-
Fix Version/s: None
-
Component/s: Parquet
-
Labels:None
Description
Table created with Hive:
CREATE TABLE IF NOT EXISTS complextable (
ind int,
map_col map<int,string>,
struct_col struct<key:int,value:string>,
array_col array<int>
)
ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat'
OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat';
Reading with Pig:
parquet.pig.TupleConversionException: error while preparing converter for:
map_col: tuple({bag_0: {map: (key: int,value: bytearray)}})
optional group map_col {
}
at parquet.pig.convert.TupleConverter.newConverter(TupleConverter.java:128)
at parquet.pig.convert.TupleConverter.<init>(TupleConverter.java:77)
at parquet.pig.convert.TupleRecordMaterializer.<init>(TupleRecordMaterializer.java:30)
at parquet.pig.TupleReadSupport.prepareForRead(TupleReadSupport.java:166)
...
Caused by: java.lang.NullPointerException
at parquet.pig.TupleReadSupport.getPigSchemaFromFile(TupleReadSupport.java:80)
at parquet.pig.TupleReadSupport.prepareForRead(TupleReadSupport.java:107)
at parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:137)
The parquet-mr github site (https://github.com/parquet/parquet-mr) says the following about reading with Pig:
"If the data was stored using another method, you will need to provide the Pig schema equivalent to the data you stored (you can also write the schema to the file footer while writing it – but that's pretty advanced). We will provide a basic automatic schema conversion soon."