Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Incomplete
    • Affects Version/s: 3.11.0, 3.12.0, 4.0.0, 4.1.0
    • Fix Version/s: None
    • Component/s: app.filebrowser
    • Labels:

      Description

      Taken from https://github.com/cloudera/hue/issues/587.

      The Hue file browser Avro preview is very slow when using WebHDFS. During our testing, we found that a 9KB Avro file would take almost 2 minutes to open with Hue 3.9.0. We tested with Hue 3.11 and found the performance was better after HUE-3718 but still slow due to repeated 1 byte calls. We traced the 1 byte calls to the Python Avro library (example for read_float https://github.com/apache/avro/blob/master/lang/py/src/avro/io.py#L180). The Python Avro library assumes the filesystem is local and 1 byte reads are ok to do repeatedly. With WebHDFS in Hue, this is not the case and each 1 byte call is an HTTP request/response.

      The File class in webhdfs.py has a read method with length as DEFAULT_READ_SIZE which is being instead called by the Avro library with read(1). The WebHDFS library in Hue should be smarter about requesting at least DEFAULT_READ_SIZE.

      A bit more background is available here: https://risdenk.github.io/2017/12/20/hue-file-browser-avro-performance.html

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              risdenk Kevin Risden
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: