Details
-
Type: New Feature
-
Status: Open
-
Priority: Major
-
Resolution: Unresolved
-
Affects Version/s: None
-
Fix Version/s: None
-
Component/s: Command-line Interface, HBase Module
-
Labels:None
Description
When scanning data imported using kite dataset, union fields contain extra characters in front of original ones.
Reproduction steps:
1. create the kite hbase dataset (files provided in attachments):
kite-dataset -v create dataset:hbase:localhost/sandwiches --schema sandwich-hbase.avsc --mapping sandwich-mapping.json --partition-by sandwich-partition.json
2. import the csv file:
kite-dataset -v csv-import sandwiches-hbase.csv dataset:hbase:localhost/sandwiches
3. hbase shell
4. scan 'sandwiches',
Result:
ROW COLUMN+CELL
1\x00\x00 column=Sandwich:description, timestamp=1467986100151, value=\x02zPastrami and sauerkraut on toasted rye with Russian dressing.
2\x00\x00 column=Sandwich:description, timestamp=1467986100151, value=\x02ZPeanut butter and grape jelly on white bread.
3\x00\x00 column=Sandwich:description, timestamp=1467986717147, value=\x02zPastrami and sauerkraut on toasted rye with Russian dressing.
4\x00\x00 column=Sandwich:description, timestamp=1467986717147, value=\x02TNut butter and grape jelly on white bread.
id10\x00\x00 column=Sandwich:description, timestamp=1467988391112, value=\x02\x0488
5. scan 'sandwiches',
{COLUMNS => ['Sandwich:description:toString']}ROW COLUMN+CELL
1\x00\x00 column=Sandwich:description, timestamp=1467986100151, value=zPastrami and sauerkraut on toasted rye with Russian dressing.
2\x00\x00 column=Sandwich:description, timestamp=1467986100151, value=ZPeanut butter and grape jelly on white bread.
The characters added to the field depends on the LENGTH of the field.
Here is the dependency table:
Nr. chars in String ====> Value added in front of String
Hex Dec
1 x02 2
2 x04 4
3 x06 6
4 x08 8
5 x0A 10
6 x0C 12
7 x0E 14
8 x10 16
9 x12 18
10 x14 20
11 x16 22
12 x18 24
13 x1A 26
14 x1C 28
15 x1E 30
So:
for len=1=> added x02
len=2=>x04
len=3=>x06
...
len=9=>x12 (see: 'sandwich1' => value=\x02\x12sandwich1 from my previous comment)
len=10=>x14
For longer fields it's unclear the rule: see from my attachments:
'value=\x02RNut butter and grape jelly on white bread' here an "R" is added, or
'value=\x02zPastrami and sauerkraut on toasted rye with Russian dressing.' => here a "z" is added...
Attachments
Issue Links
- links to