Uploaded image for project: 'Sqoop (READ-ONLY)'
  1. Sqoop (READ-ONLY)
  2. SQOOP-140

Sqoop throws OutOfMemory exceptions / Java heap space errors when importing large tables from PostgreSQL

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.1.0
    • Fix Version/s: 1.3.0
    • Component/s: vendors/postgres
    • Labels:
      None
    • Environment:
      CDH3b3 on Ubuntu 10.04.1 (64bit)
      Java(TM) SE Runtime Environment (build 1.6.0_22-b04)
      sqoop 1.1.0+8 with additional lib postgresql-9.0-801.jdbc4.jar
      connecting to PostgreSQL 9.0.2

      Description

      Sqoop throws OutOfMemory exceptions / Java heap space errors when importing large tables from PostgreSQL.
      See also https://groups.google.com/a/cloudera.org/group/sqoop-user/browse_thread/thread/a03e55e8374cc246#

      :~$ sqoop import --connect 'jdbc:postgresql://db.example.com:5432/
      example' --username user -P --as-sequencefile --table largetable
      Enter password:
      11/01/20 13:40:32 INFO tool.CodeGenTool: Beginning code generation
      11/01/20 13:40:33 INFO manager.PostgresqlManager: Executing SQL statement: SELECT t.* FROM largetable AS t LIMIT 1
      [...]
      11/01/20 13:40:35 INFO mapred.JobClient: Running job: job_201101191544_0013
      11/01/20 13:40:36 INFO mapred.JobClient:  map 0% reduce 0%
      11/01/20 13:41:10 INFO mapred.JobClient: Task Id :
      attempt_201101191544_0013_m_000002_0, Status : FAILED
      Error: Java heap space
      [...]
      11/01/20 13:43:40 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 184.8609 seconds (0 bytes/sec)
      11/01/20 13:43:40 INFO mapreduce.ImportJobBase: Retrieved 0 records.
      11/01/20 13:43:40 ERROR tool.ImportTool: Error during import: Import job failed!
      

      The stacktrace for one of the failed task attempts looks like this:

      2011-01-20 13:41:11,387 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library
      2011-01-20 13:41:11,843 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId=
      2011-01-20 13:41:42,880 INFO com.cloudera.sqoop.mapreduce.AutoProgressMapper: Auto-progress thread is finished. keepGoing=false
      2011-01-20 13:41:42,894 FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.OutOfMemoryError: Java heap space
      	at java.lang.Class.getDeclaredFields0(Native Method)
      	at java.lang.Class.privateGetDeclaredFields(Class.java:2291)
      	at java.lang.Class.getDeclaredField(Class.java:1880)
      	at java.util.concurrent.atomic.AtomicReferenceFieldUpdater$AtomicReferenceFieldUpdaterImpl.<init>(AtomicReferenceFieldUpdater.java:181)
      	at java.util.concurrent.atomic.AtomicReferenceFieldUpdater.newUpdater(AtomicReferenceFieldUpdater.java:65)
      	at java.sql.SQLException.<clinit>(SQLException.java:353)
      	at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1817)
      	at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257)
      	at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:500)
      	at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:388)
      	at org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:273)
      	at com.cloudera.sqoop.mapreduce.db.DBRecordReader.executeQuery(DBRecordReader.java:100)
      	at com.cloudera.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:225)
      	at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:448)
      	at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
      	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
      	at com.cloudera.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:187)
      	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:639)
      	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:315)
      	at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:396)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
      	at org.apache.hadoop.mapred.Child.main(Child.java:211)
      

      In com.cloudera.sqoop.mapreduce.db.DBRecordReader.executeQuery() there is no call to setFetchSize:

      DBRecordReader.java:97-101
        protected ResultSet executeQuery(String query) throws SQLException {
          this.statement = connection.prepareStatement(query,
              ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY);
          return statement.executeQuery();
        }
      

      The most likely cause for this error is JDBC loading the whole resultset into memory because of the missing setFetchSize. An example for the usage of setFetchSize can be found in com.cloudera.sqoop.manager.PostgresqlManager.execute(PostgresqlManager.java:116-119).

        Attachments

          Activity

            People

            • Assignee:
              haeusler Michael Häusler
              Reporter:
              haeusler Michael Häusler
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: