[HUE-2142] [core] Automated scalable download of query results - Cloudera Open Source

Details

Type: Improvement
Status: Closed
Priority: Critical
Resolution: Incomplete
Affects Version/s: 3.6.0, 4.3.0
Fix Version/s: None
Component/s: core.api
Labels:
None

Target Version:

4.9.0

Description

A Webserver:

cannot serve/convert a lot of data
cannot pull the query statuses continuously and download / cache the first part of results before releasing the query resources
resultsetl conversion takes minutes to be done and give a "hanging" feeling to the user

We probably need a task server to free it and scale downloads.

Note: supporting XLS output too would be helpful

FI: Beelines has these options https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-BeelineCommandOptions

Priority is on SQL (Hive, Impala).

With optionally

Any Editors
Search
HDFS

Attachments

Options
- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

Attachments

cpu_slowness.png
178 kB
11/Jun/14 1:00 AM

Issue Links

depends on

HUE-8259 [core] Task server libs

Resolved

is parent of

HUE-8747 [editor] Download query result as task

Resolved

relates to

HUE-2527 [beeswax] XLS downloads with large column/row counts

Resolved

HUE-2256 [beeswax] Speed up deserialization API

Resolved

HUE-4181 [hive] API to provide result set row count and data size

Resolved

HUE-2088 High CPU usage and slowness on downloads

Resolved

HUE-4201 [editor] Add max limit of rows before truncation in the export / download query result

Resolved

HUE-2523 [core] Add support for xlsx export

Resolved

HUE-3871 [editor] Save large file as excel, didn’t even start

Resolved

HUE-6766 [search] Support scalable download of millions of results

Closed

HUE-3096 [core] Warn the user when query results exprted to CSV/XLS are truncated

Resolved

HUE-2244 Add an option to select the number of rows in an csv file have to be skipped before rows are considered to be data rows.

Closed

(7 relates to)

Options

Progress

Sub-Tasks

1.	[editor] API to provide result set row count and data size	Resolved	Romain Rigaux
2.	[hive] API to provide result set row count and data size	Resolved	Jenny Kim
3.	[editor] Disable download and non batch export when result is too large	Closed	Unassigned

Activity

People

Assignee:

Jean Francois Desjeans Gauthier

Reporter:

Romain Rigaux

Votes:

2 Vote for this issue

Watchers:

8 Start watching this issue

Dates

Created:

06/Jun/14 5:14 PM

Updated:

26/Feb/21 10:55 PM

Resolved:

26/Feb/21 10:55 PM