Uploaded image for project: 'Sqoop'
  1. Sqoop
  2. SQOOP-129

Newlines in RDBMS Fields Break Hive

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.1.0
    • Fix Version/s: 1.3.0
    • Component/s: hive, import
    • Labels:
      None

      Description

      At the moment, Hive does not support record delimiters other than newlines. Additionally, Hive treats both newlines and carriage returns as record delimiters. Any newlines or carriage returns in fields in a RDBMS, after importation with Sqoop, will cause Hive to misread tables.

      The current Sqoop docs do note:

      Hive does not support enclosing and escaping characters. You must choose unambiguous field and record-terminating delimiters without the help of escaping and enclosing characters when working with Hive; this is a limitation of Hive's input parsing abilities.

      The problem is that users cannot choose their own record-terminating delimiters at this time. Rather than requiring that users preprocess fields and strip all newlines and carriage returns prior to running a Sqoop job, it would be immensely useful to add the option for users to simply specify replacement characters for both record and field terminating delimiters (replacements because, as the sqoop docs note above, there is no escaping).

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                jon Jonathan Hsieh
                Reporter:
                bmuller Brian Muller
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: