Details
Description
Similar to HUE-3238, but with Counters of the last MR or Spark job
For MR jobs, might be possible to get num rows if we get the last task of the last job and look for output from FileSinkOperator:
2016-07-18 13:31:18,165 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: FS[1]: records written - 106
2016-07-18 13:31:18,493 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: RECORDS_OUT_0:106
For Spark, this is easier b/c we can get rows and size directly from actual job log:
FO : Status: Finished successfully in 15.16 seconds
INFO : =====Spark Job[549c452d-fb29-47e6-8115-f0ecc5fb7486] statistics=====
INFO : HIVE
INFO : CREATED_FILES: 1
INFO : RECORDS_OUT_0: 106
INFO : RECORDS_IN: 53
INFO : RECORDS_OUT_INTERMEDIATE: 424
INFO : DESERIALIZE_ERRORS: 0
INFO : Spark Job[549c452d-fb29-47e6-8115-f0ecc5fb7486] Metrics
INFO : ExecutorDeserializeTime: 3652
INFO : ExecutorRunTime: 5500
INFO : ResultSize: 3561