Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.8.0
    • Fix Version/s: 0.18.0
    • Component/s: Data Module
    • Labels:
      None

      Description

      Users with existing datasets probably don't want to copy them to use CDK libraries. We need to add compatibility so that users can point CDK at their data and configure it for their existing data layout. This includes:

      • Custom partition strategy to FS path conversion
      • Wrapping custom InputFormat classes

      Here's an idea of what the configuration might look like:

        DatasetDescriptor desc = new DatasetDescriptor.Builder()
            .schema(SomeObject.class) // basic description of SomeObject in avro
            // used to wrap an InputFormat with the Dataset API
            .property("cdk.reader.input-format", Class<? extends InputFormat>)
            .location('hdfs:/data/format')
            .get();
      

        Attachments

          Activity

            People

            • Assignee:
              blue Ryan Blue
              Reporter:
              blue Ryan Blue
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: