[HUE-4181] [hive] API to provide result set row count and data size - Cloudera Open Source

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.10.0
Fix Version/s: 3.12.0
Component/s: con.hive
Labels:
None

Target Version:

3.12.0

Description

Similar to ~~HUE-3238~~, but with Counters of the last MR or Spark job

For MR jobs, might be possible to get num rows if we get the last task of the last job and look for output from FileSinkOperator:

2016-07-18 13:31:18,165 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: FS[1]: records written - 106
2016-07-18 13:31:18,493 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: RECORDS_OUT_0:106

For Spark, this is easier b/c we can get rows and size directly from actual job log:

FO : Status: Finished successfully in 15.16 seconds
INFO : =====Spark Job[549c452d-fb29-47e6-8115-f0ecc5fb7486] statistics=====
INFO : HIVE
INFO : CREATED_FILES: 1
INFO : RECORDS_OUT_0: 106
INFO : RECORDS_IN: 53
INFO : RECORDS_OUT_INTERMEDIATE: 424
INFO : DESERIALIZE_ERRORS: 0
INFO : Spark Job[549c452d-fb29-47e6-8115-f0ecc5fb7486] Metrics
INFO : ExecutorDeserializeTime: 3652
INFO : ExecutorRunTime: 5500
INFO : ResultSize: 3561

Attachments

Issue Links

relates to

HUE-3238 [editor] Provide Impala query profile and summary information

Resolved

HUE-2142 [core] Automated scalable download of query results

Closed

Activity

People

Assignee:

Jenny Kim

Reporter:

Romain Rigaux

Votes:

0 Vote for this issue

Watchers:

2 Start watching this issue

Dates

Created:

22/Jun/16 6:53 AM

Updated:

14/Oct/16 12:58 AM

Resolved:

08/Sep/16 12:08 AM