Details
-
Type:
New Feature
-
Status: Open
-
Priority:
Major
-
Resolution: Unresolved
-
Affects Version/s: 1.1.0
-
Fix Version/s: None
-
Component/s: None
-
Labels:
Description
I am trying to partition the data using kite, and on the field in of type enum. This is the code used:
PartitionStrategy partition = new PartitionStrategy.Builder() .identity("transactionType", "transactiontype_partition").identity("dataSource", "database_partition") .year("timestamp", "year_partition").month("timestamp", "month_partition") .day("timestamp", "day_partition").build(); DatasetDescriptor descriptor = new DatasetDescriptor.Builder().partitionStrategy(partition) .schema(Event.class).build();
Here Event.class is a avro generatoed class, and one of the field in it is of type enum. The code fails to partition the data with:
Field type ENUM does not match partitioner IdentityFieldPartitioner{sourceName=transactionType, name=transactiontype_partition, type=class java.lang.Object, cardinality=-1}
From the strategy format doc, it says it says identity works only with string or number, hash works with any object. So i tried with:
PartitionStrategy partition = new PartitionStrategy.Builder() .hash("transactionType", "transactiontype_partition", 10).hash("dataSource", "database_partition", 10) .year("timestamp", "year_partition").month("timestamp", "month_partition") .day("timestamp", "day_partition").build(); DatasetDescriptor descriptor = new DatasetDescriptor.Builder().partitionStrategy(partition) .schema(Event.class).build();
I get the this error when I use hash. Is there a way to partition data if on the field the data is partition is of type enum.
Field type ENUM does not match partitioner HashFieldPartitioner{sourceName=transactionType, name=transactiontype_partition, cardinality=10}
Event.avdl:
record Event {
string transactionId;
string submitterId;
SourceType dataSourceType;
string eventId;
// this field is of type enum, and I want to partition on this field.
TransactionType transactionType;
}
Opened this issue after discussing here: https://groups.google.com/a/cloudera.org/forum/#!topic/cdk-dev/AUByPO8G1u4