Details
-
Type:
Bug
-
Status: Resolved
-
Priority:
Blocker
-
Resolution: Fixed
-
Affects Version/s: 3.5.0
-
Fix Version/s: None
-
Component/s: None
-
Labels:None
-
Environment:
CDH5, Hue 3.5 on CentOS 6.5
-
Target Version:
Description
When downloading the results of a query Hue takes a lot of CPU (100+%) for minutes at a time before delivering results. (One data point: 6 MB of result data took about 2 minutes to prepare)
We've only tried this from within the Hue query editor but I'd expect the same in Impala & Pig. This is a major regression from Hue 2.x.
This is a quote from Abe from the mailing list:
There were a few changes made in CDH5 with regards to downloading result sets. Hue uses "tablib", which does things in bulk rather than in streams. The slowness is likely due to all 65536 rows being processed before the download starts. The error message you are seeing is likely the limit imposed on XLS downloads by the underlying library. If you download as CSV and convert to XLS, you won't have this problem. To download directly to XLS format, limit your query to less than 65536 rows. Adding streaming back to the download process would make a lot of sense if we can find a library like tablib that supports streaming.
Attachments
Issue Links
- relates to
-
HUE-2142 [core] Automated scalable download of query results
-
- Closed
-