[SQOOP-48] import bug when splitting over unsigned bigint column (mysql) - Cloudera Open Source

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.3.0
Component/s: import
Labels:
None
Environment:
MySQL, bug confirmed on 5.1.41 (Ubuntu) and 5.1.34 (RHE5)

Description

(Demonstrated this bug to Aaron K. at Cloudera Hackathon on 7/27).

I discovered that when importing a table using an unsigned bigint as the primary key, the auto-generated splitting intervals are buggy. To duplicate:

mysql> create table TestInfo (
userid bigint(20) unsigned NOT NULL DEFAULT '0',
name varchar(100) COLLATE utf8_unicode_ci DEFAULT '',
primary key(userid)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

mysql> INSERT INTO TestInfo VALUES (14, 'foo'), (7863696997872966707, 'bar')

$ sqoop import --connect jdbc:mysql://localhost/sqoop --username root -P --warehouse-dir /tmp --table TestInfo --split-by userid --where 'userid>0'

I'll add the mysql query log as an attachment. Basically it generates a number of intervals including negative values, and the resulting imported dataset includes duplicates:

$ hadoop fs -getmerge /tmp/TestInfo .
$ cat TestInfo
14,foo
14,foo
7863696997872966707,bar

Little help? Would be happy to provide additional info as requested.

Thanks.

Attachments

Options
- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

Attachments

0001-SQOOP-48-Import-bug-when-splitting-over-unsigned-big.patch
5 kB
08/Jun/11 12:08 AM
0001-SQOOP-48-Import-bug-when-splitting-over-unsigned-big.patch
5 kB
03/Jun/11 3:36 AM
sqoop-48-mysql.log
9 kB
02/Aug/10 10:54 PM

Activity

People

Assignee:

Jonathan Hsieh

Reporter:

James Warren

Votes:

0 Vote for this issue

Watchers:

2 Start watching this issue

Dates

Created:

02/Aug/10 10:50 PM

Updated:

08/Jun/11 12:08 AM

Resolved:

08/Jun/11 12:06 AM