Details
-
Type:
Improvement
-
Status: Closed
-
Priority:
Major
-
Resolution: Incomplete
-
Affects Version/s: 3.11.0, 3.12.0, 4.0.0, 4.1.0
-
Fix Version/s: None
-
Component/s: app.filebrowser
-
Labels:
Description
Taken from https://github.com/cloudera/hue/issues/587.
The Hue file browser Avro preview is very slow when using WebHDFS. During our testing, we found that a 9KB Avro file would take almost 2 minutes to open with Hue 3.9.0. We tested with Hue 3.11 and found the performance was better after
HUE-3718but still slow due to repeated 1 byte calls. We traced the 1 byte calls to the Python Avro library (example for read_float https://github.com/apache/avro/blob/master/lang/py/src/avro/io.py#L180). The Python Avro library assumes the filesystem is local and 1 byte reads are ok to do repeatedly. With WebHDFS in Hue, this is not the case and each 1 byte call is an HTTP request/response.The File class in webhdfs.py has a read method with length as DEFAULT_READ_SIZE which is being instead called by the Avro library with read(1). The WebHDFS library in Hue should be smarter about requesting at least DEFAULT_READ_SIZE.
A bit more background is available here: https://risdenk.github.io/2017/12/20/hue-file-browser-avro-performance.html