Details
Description
At the moment, Hive does not support record delimiters other than newlines. Additionally, Hive treats both newlines and carriage returns as record delimiters. Any newlines or carriage returns in fields in a RDBMS, after importation with Sqoop, will cause Hive to misread tables.
The current Sqoop docs do note:
Hive does not support enclosing and escaping characters. You must choose unambiguous field and record-terminating delimiters without the help of escaping and enclosing characters when working with Hive; this is a limitation of Hive's input parsing abilities.
The problem is that users cannot choose their own record-terminating delimiters at this time. Rather than requiring that users preprocess fields and strip all newlines and carriage returns prior to running a Sqoop job, it would be immensely useful to add the option for users to simply specify replacement characters for both record and field terminating delimiters (replacements because, as the sqoop docs note above, there is no escaping).
Attachments
Issue Links
- relates to
-
SQOOP-190 Sqoop shouldn't use generated SqoopRecord.toString in text output cases.
- Open