Uploaded image for project: 'Hue'
  1. Hue
  2. HUE-2088

High CPU usage and slowness on downloads

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 3.5.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Environment:

      CDH5, Hue 3.5 on CentOS 6.5

      Description

      When downloading the results of a query Hue takes a lot of CPU (100+%) for minutes at a time before delivering results. (One data point: 6 MB of result data took about 2 minutes to prepare)

      We've only tried this from within the Hue query editor but I'd expect the same in Impala & Pig. This is a major regression from Hue 2.x.

      This is a quote from Abe from the mailing list:

      There were a few changes made in CDH5 with regards to downloading result sets. Hue uses "tablib", which does things in bulk rather than in streams. The slowness is likely due to all 65536 rows being processed before the download starts. The error message you are seeing is likely the limit imposed on XLS downloads by the underlying library. If you download as CSV and convert to XLS, you won't have this problem. To download directly to XLS format, limit your query to less than 65536 rows. Adding streaming back to the download process would make a lot of sense if we can find a library like tablib that supports streaming.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                abe Abraham Elmahrek
                Reporter:
                lars_francke Lars Francke
              • Votes:
                0 Vote for this issue
                Watchers:
                0 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: