Uploaded image for project: 'CDH (READ-ONLY)'
  1. CDH (READ-ONLY)
  2. DISTRO-543

Pig fails to read Parquet file with a complex field if schema not specified explicitly

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: CDH4.5.0
    • Fix Version/s: None
    • Component/s: Parquet
    • Labels:
      None

      Description

      Table created with Hive:

      CREATE TABLE IF NOT EXISTS complextable (
      ind int,
      map_col map<int,string>,
      struct_col struct<key:int,value:string>,
      array_col array<int>
      )
      ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
      STORED AS INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat'
      OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat';

      Reading with Pig:

      parquet.pig.TupleConversionException: error while preparing converter for:
      map_col: tuple({bag_0: {map: (key: int,value: bytearray)}})
      optional group map_col {
      }
      at parquet.pig.convert.TupleConverter.newConverter(TupleConverter.java:128)
      at parquet.pig.convert.TupleConverter.<init>(TupleConverter.java:77)
      at parquet.pig.convert.TupleRecordMaterializer.<init>(TupleRecordMaterializer.java:30)
      at parquet.pig.TupleReadSupport.prepareForRead(TupleReadSupport.java:166)
      ...
      Caused by: java.lang.NullPointerException
      at parquet.pig.TupleReadSupport.getPigSchemaFromFile(TupleReadSupport.java:80)
      at parquet.pig.TupleReadSupport.prepareForRead(TupleReadSupport.java:107)
      at parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:137)

      The parquet-mr github site (https://github.com/parquet/parquet-mr) says the following about reading with Pig:

      "If the data was stored using another method, you will need to provide the Pig schema equivalent to the data you stored (you can also write the schema to the file footer while writing it – but that's pretty advanced). We will provide a basic automatic schema conversion soon."

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              robw Rob Weltman
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated: