Details
-
Type:
Improvement
-
Status: Resolved
-
Priority:
Major
-
Resolution: Fixed
-
Affects Version/s: 1.0.0
-
Fix Version/s: 1.1.0
-
Component/s: None
-
Labels:None
Description
As discussed on the mailing list, [1] add a PartitionView to support operations that use partitions in other Hadoop-based tools, such as HCatalog/Hive.
The semantics of a PartitionView should be refined as part of this issue, but here's a proposed starting point:
- A PartitionView is simply a Kite View with certain properties.
- These properties include:
- A PartitionView is uniquely identified by a set of keys (e.g.
{year = 2015, month = 3, day = 26}
)
- A PartitionView can be deleted (i.e., view.deleteAll() is guaranteed to work).
- A PartitionView can be efficiently moved or archived for data management needs – but functions to do so may be out of the scope of this issue.
- A PartitionView is uniquely identified by a set of keys (e.g.
{year = 2015, month = 3, day = 26}
- There should be a way to enumerate PartitionViews for a Dataset or View
- Enumerating all PartitionViews should match the behavior of enumerating partitions in Hive (e.g. "show partitions") if possible. This seems like the Least Surprising behavior and stays consistent for those also using other systems.
[1]
https://groups.google.com/a/cloudera.org/forum/#!topic/cdk-dev/RAJIdJSadT0
Attachments
Issue Links
- links to