Details
Description
Sqoop currently uses a two-tier model for importing to Hive. First it gets results into HDFS in some directory. Then it creates a Hive table and runs a LOAD DATA command to pull the data into the Hive warehouse subdir. This works fine for the first import where the files "part-m-xxxx.gz" are loaded by Hive (which merely moves from the intermediary directory to the warehouse subdir).
With the next call, however, the AppendUtils class checks the intermediary directory for the last file (which always returns none - because the HIVE LOAD DATA moved all of them) and, upon not finding any, names the files starting at "part-m-0000.gz". Then, when Hive attempts to load the data (which, again, is just a move from that directory to its warehouse subdir) it can't, because a file already exists with the name "part-m-0000.gz" in its subdirectory.
The solution is to have AppendUtils check the hive warehouse dir (if --warehouse-dir and --hive-import are specified) to determine what the next part name should be. Then, sqoop can be executed multiple times to append to the Hive DB.