Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.10.0
    • Fix Version/s: 3.12.0
    • Component/s: app.hive
    • Labels:
      None

      Description

      Similar to HUE-3238, but with Counters of the last MR or Spark job

      For MR jobs, might be possible to get num rows if we get the last task of the last job and look for output from FileSinkOperator:

      2016-07-18 13:31:18,165 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: FS[1]: records written - 106
      2016-07-18 13:31:18,493 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: RECORDS_OUT_0:106

      For Spark, this is easier b/c we can get rows and size directly from actual job log:

      FO : Status: Finished successfully in 15.16 seconds
      INFO : =====Spark Job[549c452d-fb29-47e6-8115-f0ecc5fb7486] statistics=====
      INFO : HIVE
      INFO : CREATED_FILES: 1
      INFO : RECORDS_OUT_0: 106
      INFO : RECORDS_IN: 53
      INFO : RECORDS_OUT_INTERMEDIATE: 424
      INFO : DESERIALIZE_ERRORS: 0
      INFO : Spark Job[549c452d-fb29-47e6-8115-f0ecc5fb7486] Metrics
      INFO : ExecutorDeserializeTime: 3652
      INFO : ExecutorRunTime: 5500
      INFO : ResultSize: 3561

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                jennykim Jenny Kim
                Reporter:
                romain Romain Rigaux
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: