Uploaded image for project: 'Sqoop (READ-ONLY)'
  1. Sqoop (READ-ONLY)
  2. SQOOP-17

Imports to Hive should use a nonce tmpdir for the HDFS import step.

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: hive, import
    • Labels:
      None

      Description

      Sqoop currently uses a two-tier model for importing to Hive. First it gets results into HDFS in some directory. Then it creates a Hive table and runs a LOAD DATA command to pull the data into the Hive warehouse subdir.

      For a table-based import, this has the unfortunate side effect of leaving behind an empty directory named after the table in your hdfs home dir. For query-based imports, we can't infer a destination directory name based on a table, so you have to specify one with --target-dir. If you import into Hive, it'll then leave this directory empty as well.

      Sqoop should recognize that when you've added --hive-import, it will be utilizing a temp directory, so it should pick a nonce directory target and then remove it when the hive import succeeds

        Attachments

          Activity

            People

            • Assignee:
              ahmed Ahmed Radwan
              Reporter:
              aaron Aaron Kimball
            • Votes:
              7 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated: