Uploaded image for project: 'Livy (READ-ONLY)'
  1. Livy (READ-ONLY)
  2. LIVY-336

Livy should not spawn one thread per job to track the job on Yarn

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 0.4
    • Fix Version/s: None
    • Component/s: Server
    • Labels:
      None
    • Environment:

      only on Yarn clusters

      Description

      SparkYarnApp spawns a new thread for each application. The thread that does the following tasks:

      1. It waits for the exit code of spark-submit
      2. It polls yarn to get the ApplicaionId from the ApplicationTag
      3. It periodically polls Yarn and updates SessionManager with updates to the state of the application on Yarn.

      Yarn does not provide an API to do 2, so Livy gets the list of all Spark applications running on yarn and traverses all of them to find the Application with the desired ApplicationTag.

      This process can be improved in a few ways to use less resources, particularly threads, and to shorten the latency. Some of the improvements are straightforward to do, but others

      1. Spawn only one thread to poll Yarn on behalf of all the applications to get the ApplicationId
      2. Use a thread pool to check the status of running applications
      3. Avoid launching spark-submit. Instead call org.apache.spark.SparkSubmit

      1,2 are easy to do, but for 3 we should decide how to redirect the stdout and stderr and needs more discussion.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              meisam Meisam
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated: