[SQOOP-140] Sqoop throws OutOfMemory exceptions / Java heap space errors when importing large tables from PostgreSQL - Cloudera Open Source

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.1.0
Fix Version/s: 1.3.0
Component/s: vendors/postgres
Labels:
None
Environment:
CDH3b3 on Ubuntu 10.04.1 (64bit)
Java(TM) SE Runtime Environment (build 1.6.0_22-b04)
sqoop 1.1.0+8 with additional lib postgresql-9.0-801.jdbc4.jar
connecting to PostgreSQL 9.0.2

Description

Sqoop throws OutOfMemory exceptions / Java heap space errors when importing large tables from PostgreSQL.
See also https://groups.google.com/a/cloudera.org/group/sqoop-user/browse_thread/thread/a03e55e8374cc246#

:~$ sqoop import --connect 'jdbc:postgresql://db.example.com:5432/
example' --username user -P --as-sequencefile --table largetable
Enter password:
11/01/20 13:40:32 INFO tool.CodeGenTool: Beginning code generation
11/01/20 13:40:33 INFO manager.PostgresqlManager: Executing SQL statement: SELECT t.* FROM largetable AS t LIMIT 1
[...]
11/01/20 13:40:35 INFO mapred.JobClient: Running job: job_201101191544_0013
11/01/20 13:40:36 INFO mapred.JobClient:  map 0% reduce 0%
11/01/20 13:41:10 INFO mapred.JobClient: Task Id :
attempt_201101191544_0013_m_000002_0, Status : FAILED
Error: Java heap space
[...]
11/01/20 13:43:40 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 184.8609 seconds (0 bytes/sec)
11/01/20 13:43:40 INFO mapreduce.ImportJobBase: Retrieved 0 records.
11/01/20 13:43:40 ERROR tool.ImportTool: Error during import: Import job failed!

The stacktrace for one of the failed task attempts looks like this:

2011-01-20 13:41:11,387 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library
2011-01-20 13:41:11,843 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId=
2011-01-20 13:41:42,880 INFO com.cloudera.sqoop.mapreduce.AutoProgressMapper: Auto-progress thread is finished. keepGoing=false
2011-01-20 13:41:42,894 FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.OutOfMemoryError: Java heap space
	at java.lang.Class.getDeclaredFields0(Native Method)
	at java.lang.Class.privateGetDeclaredFields(Class.java:2291)
	at java.lang.Class.getDeclaredField(Class.java:1880)
	at java.util.concurrent.atomic.AtomicReferenceFieldUpdater$AtomicReferenceFieldUpdaterImpl.<init>(AtomicReferenceFieldUpdater.java:181)
	at java.util.concurrent.atomic.AtomicReferenceFieldUpdater.newUpdater(AtomicReferenceFieldUpdater.java:65)
	at java.sql.SQLException.<clinit>(SQLException.java:353)
	at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1817)
	at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257)
	at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:500)
	at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:388)
	at org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:273)
	at com.cloudera.sqoop.mapreduce.db.DBRecordReader.executeQuery(DBRecordReader.java:100)
	at com.cloudera.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:225)
	at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:448)
	at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
	at com.cloudera.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:187)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:639)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:315)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
	at org.apache.hadoop.mapred.Child.main(Child.java:211)

In com.cloudera.sqoop.mapreduce.db.DBRecordReader.executeQuery() there is no call to setFetchSize:

DBRecordReader.java:97-101

  protected ResultSet executeQuery(String query) throws SQLException {
    this.statement = connection.prepareStatement(query,
        ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY);
    return statement.executeQuery();
  }

The most likely cause for this error is JDBC loading the whole resultset into memory because of the missing setFetchSize. An example for the usage of setFetchSize can be found in com.cloudera.sqoop.manager.PostgresqlManager.execute(PostgresqlManager.java:116-119).

Attachments

Options
- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

Attachments

SQOOP-140.patch
8 kB
24/Jan/11 12:16 PM

Sqoop throws OutOfMemory exceptions / Java heap space errors when importing large tables from PostgreSQL

Details

Description

Attachments

Attachments

Activity

People

Dates