Uploaded image for project: 'Sqoop (READ-ONLY)'
  1. Sqoop (READ-ONLY)
  2. SQOOP-48

import bug when splitting over unsigned bigint column (mysql)

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.3.0
    • Component/s: import
    • Labels:
      None
    • Environment:
      MySQL, bug confirmed on 5.1.41 (Ubuntu) and 5.1.34 (RHE5)

      Description

      (Demonstrated this bug to Aaron K. at Cloudera Hackathon on 7/27).

      I discovered that when importing a table using an unsigned bigint as the primary key, the auto-generated splitting intervals are buggy. To duplicate:

      mysql> create table TestInfo (
      userid bigint(20) unsigned NOT NULL DEFAULT '0',
      name varchar(100) COLLATE utf8_unicode_ci DEFAULT '',
      primary key(userid)
      ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

      mysql> INSERT INTO TestInfo VALUES (14, 'foo'), (7863696997872966707, 'bar')

      $ sqoop import --connect jdbc:mysql://localhost/sqoop --username root -P --warehouse-dir /tmp --table TestInfo --split-by userid --where 'userid>0'

      I'll add the mysql query log as an attachment. Basically it generates a number of intervals including negative values, and the resulting imported dataset includes duplicates:

      $ hadoop fs -getmerge /tmp/TestInfo .
      $ cat TestInfo
      14,foo
      14,foo
      7863696997872966707,bar

      Little help? Would be happy to provide additional info as requested.

      Thanks.

        Attachments

          Activity

            People

            • Assignee:
              jon Jonathan Hsieh
              Reporter:
              jwarren James Warren
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: