Details
-
Type:
Sub-task
-
Status: Open
-
Priority:
Major
-
Resolution: Unresolved
-
Affects Version/s: 0.14.0
-
Fix Version/s: None
-
Component/s: Data Module, Documentation
-
Labels:None
Description
At least two schema constructs are known to be problematic:
1. Unions are difficult to map into a table structure. The general work-around is to widen the structure into a record with nullable fields where one is non-null, but enforcing that only one is set becomes a problem. How to deal with multiple non-null union values is undefined. This is implemented in parquet-avro, but not HBase.
2. Recursive schemas. Parquet does not currently support recursive schemas, and requires significant work to add support. HBase can, in theory, support recursive schemas but this will need to be built into the format support when we move from Avro.
Any others? What about integration consequences?
We should decide how to handle these cases and get it implemented. I'm fine with a warning for #1 and only allowing #2 in Avro.