Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      When scanning data imported using kite dataset, union fields contain extra characters in front of original ones.

      Reproduction steps:
      1. create the kite hbase dataset (files provided in attachments):
      kite-dataset -v create dataset:hbase:localhost/sandwiches --schema sandwich-hbase.avsc --mapping sandwich-mapping.json --partition-by sandwich-partition.json
      2. import the csv file:
      kite-dataset -v csv-import sandwiches-hbase.csv dataset:hbase:localhost/sandwiches
      3. hbase shell
      4. scan 'sandwiches',

      {COLUMNS => ['Sandwich:description']}

      Result:
      ROW COLUMN+CELL
      1\x00\x00 column=Sandwich:description, timestamp=1467986100151, value=\x02zPastrami and sauerkraut on toasted rye with Russian dressing.
      2\x00\x00 column=Sandwich:description, timestamp=1467986100151, value=\x02ZPeanut butter and grape jelly on white bread.
      3\x00\x00 column=Sandwich:description, timestamp=1467986717147, value=\x02zPastrami and sauerkraut on toasted rye with Russian dressing.
      4\x00\x00 column=Sandwich:description, timestamp=1467986717147, value=\x02TNut butter and grape jelly on white bread.
      id10\x00\x00 column=Sandwich:description, timestamp=1467988391112, value=\x02\x0488

      5. scan 'sandwiches',

      {COLUMNS => ['Sandwich:description:toString']}

      ROW COLUMN+CELL
      1\x00\x00 column=Sandwich:description, timestamp=1467986100151, value=zPastrami and sauerkraut on toasted rye with Russian dressing.
      2\x00\x00 column=Sandwich:description, timestamp=1467986100151, value=ZPeanut butter and grape jelly on white bread.

      The characters added to the field depends on the LENGTH of the field.
      Here is the dependency table:
      Nr. chars in String ====> Value added in front of String
      Hex Dec
      1 x02 2
      2 x04 4
      3 x06 6
      4 x08 8
      5 x0A 10
      6 x0C 12
      7 x0E 14
      8 x10 16
      9 x12 18
      10 x14 20
      11 x16 22
      12 x18 24
      13 x1A 26
      14 x1C 28
      15 x1E 30
      So:
      for len=1=> added x02
      len=2=>x04
      len=3=>x06
      ...
      len=9=>x12 (see: 'sandwich1' => value=\x02\x12sandwich1 from my previous comment)
      len=10=>x14
      For longer fields it's unclear the rule: see from my attachments:
      'value=\x02RNut butter and grape jelly on white bread' here an "R" is added, or
      'value=\x02zPastrami and sauerkraut on toasted rye with Russian dressing.' => here a "z" is added...

        Attachments

        1. sandwiches-hbase.csv
          0.2 kB
        2. sandwich-hbase.avsc
          0.4 kB
        3. sandwich-mapping.json
          0.3 kB
        4. sandwich-partition.json
          0.1 kB

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                gnemeth Gabor Nemeth
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated: