Uploaded image for project: 'Kite SDK (READ-ONLY)'
  1. Kite SDK (READ-ONLY)
  2. KITE-1075

Partition data, when one of the field is ENUM

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.1.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:

      Description

      I am trying to partition the data using kite, and on the field in of type enum. This is the code used:

      PartitionStrategy partition = new PartitionStrategy.Builder()
        .identity("transactionType", "transactiontype_partition").identity("dataSource", "database_partition")
        .year("timestamp", "year_partition").month("timestamp", "month_partition")
        .day("timestamp", "day_partition").build();
      
      DatasetDescriptor descriptor = new DatasetDescriptor.Builder().partitionStrategy(partition)
        .schema(Event.class).build();
      

      Here Event.class is a avro generatoed class, and one of the field in it is of type enum. The code fails to partition the data with:

      Field type ENUM does not match partitioner IdentityFieldPartitioner{sourceName=transactionType, name=transactiontype_partition, type=class java.lang.Object, cardinality=-1}
      

      From the strategy format doc, it says it says identity works only with string or number, hash works with any object. So i tried with:

      PartitionStrategy partition = new PartitionStrategy.Builder()
        .hash("transactionType", "transactiontype_partition", 10).hash("dataSource", "database_partition", 10)
        .year("timestamp", "year_partition").month("timestamp", "month_partition")
        .day("timestamp", "day_partition").build();
      
      DatasetDescriptor descriptor = new DatasetDescriptor.Builder().partitionStrategy(partition)
        .schema(Event.class).build();
      

      I get the this error when I use hash. Is there a way to partition data if on the field the data is partition is of type enum.

      Field type ENUM does not match partitioner HashFieldPartitioner{sourceName=transactionType, name=transactiontype_partition, cardinality=10}
      

      Event.avdl:

      record Event {
              string transactionId;
              string submitterId;
             SourceType dataSourceType;
             string eventId;
       // this field is of type enum, and I want to partition on this field.
             TransactionType transactionType;
      }
      

      Opened this issue after discussing here: https://groups.google.com/a/cloudera.org/forum/#!topic/cdk-dev/AUByPO8G1u4

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              hitesh_g_b@yahoo.co.in hitesh gollahalli bachanna
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: