Details
-
Type:
Sub-task
-
Status: Resolved
-
Priority:
Major
-
Resolution: Fixed
-
Affects Version/s: 0.8.0
-
Fix Version/s: 0.18.0
-
Component/s: Data Module
-
Labels:None
Description
Users with existing datasets probably don't want to copy them to use CDK libraries. We need to add compatibility so that users can point CDK at their data and configure it for their existing data layout. This includes:
Custom partition strategy to FS path conversion- Wrapping custom InputFormat classes
Here's an idea of what the configuration might look like:
DatasetDescriptor desc = new DatasetDescriptor.Builder() .schema(SomeObject.class) // basic description of SomeObject in avro // used to wrap an InputFormat with the Dataset API .property("cdk.reader.input-format", Class<? extends InputFormat>) .location('hdfs:/data/format') .get();