Details
-
Type:
Bug
-
Status: Resolved
-
Priority:
Major
-
Resolution: Fixed
-
Affects Version/s: 1.1.0
-
Fix Version/s: 1.3.0
-
Component/s: vendors/postgres
-
Labels:None
-
Environment:CDH3b3 on Ubuntu 10.04.1 (64bit)
Java(TM) SE Runtime Environment (build 1.6.0_22-b04)
sqoop 1.1.0+8 with additional lib postgresql-9.0-801.jdbc4.jar
connecting to PostgreSQL 9.0.2
Description
Sqoop throws OutOfMemory exceptions / Java heap space errors when importing large tables from PostgreSQL.
See also https://groups.google.com/a/cloudera.org/group/sqoop-user/browse_thread/thread/a03e55e8374cc246#
:~$ sqoop import --connect 'jdbc:postgresql://db.example.com:5432/ example' --username user -P --as-sequencefile --table largetable Enter password: 11/01/20 13:40:32 INFO tool.CodeGenTool: Beginning code generation 11/01/20 13:40:33 INFO manager.PostgresqlManager: Executing SQL statement: SELECT t.* FROM largetable AS t LIMIT 1 [...] 11/01/20 13:40:35 INFO mapred.JobClient: Running job: job_201101191544_0013 11/01/20 13:40:36 INFO mapred.JobClient: map 0% reduce 0% 11/01/20 13:41:10 INFO mapred.JobClient: Task Id : attempt_201101191544_0013_m_000002_0, Status : FAILED Error: Java heap space [...] 11/01/20 13:43:40 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 184.8609 seconds (0 bytes/sec) 11/01/20 13:43:40 INFO mapreduce.ImportJobBase: Retrieved 0 records. 11/01/20 13:43:40 ERROR tool.ImportTool: Error during import: Import job failed!
The stacktrace for one of the failed task attempts looks like this:
2011-01-20 13:41:11,387 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library 2011-01-20 13:41:11,843 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId= 2011-01-20 13:41:42,880 INFO com.cloudera.sqoop.mapreduce.AutoProgressMapper: Auto-progress thread is finished. keepGoing=false 2011-01-20 13:41:42,894 FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.OutOfMemoryError: Java heap space at java.lang.Class.getDeclaredFields0(Native Method) at java.lang.Class.privateGetDeclaredFields(Class.java:2291) at java.lang.Class.getDeclaredField(Class.java:1880) at java.util.concurrent.atomic.AtomicReferenceFieldUpdater$AtomicReferenceFieldUpdaterImpl.<init>(AtomicReferenceFieldUpdater.java:181) at java.util.concurrent.atomic.AtomicReferenceFieldUpdater.newUpdater(AtomicReferenceFieldUpdater.java:65) at java.sql.SQLException.<clinit>(SQLException.java:353) at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1817) at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257) at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:500) at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:388) at org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:273) at com.cloudera.sqoop.mapreduce.db.DBRecordReader.executeQuery(DBRecordReader.java:100) at com.cloudera.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:225) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:448) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at com.cloudera.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:187) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:639) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:315) at org.apache.hadoop.mapred.Child$4.run(Child.java:217) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063) at org.apache.hadoop.mapred.Child.main(Child.java:211)
In com.cloudera.sqoop.mapreduce.db.DBRecordReader.executeQuery() there is no call to setFetchSize:
DBRecordReader.java:97-101
protected ResultSet executeQuery(String query) throws SQLException { this.statement = connection.prepareStatement(query, ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY); return statement.executeQuery(); }
The most likely cause for this error is JDBC loading the whole resultset into memory because of the missing setFetchSize. An example for the usage of setFetchSize can be found in com.cloudera.sqoop.manager.PostgresqlManager.execute(PostgresqlManager.java:116-119).