Updating from Jobs API 2.1 to 2.2

Artikkeli
12/16/2024

This article details updates and enhancements to the functionality in version 2.2 of the Jobs API and includes information to help you update your existing API clients to work with this new version. To learn about the changes between versions 2.0 and 2.1 of the API, see Updating from Jobs API 2.0 to 2.1.

Because the Jobs API 2.2 version enhances the existing support for paginating large result sets, Databricks recommends using Jobs API 2.2 for your API scripts and clients, particularly when responses might include a large number of tasks.

In addition to the changes included in version 2.1 of the Databricks Jobs API, version 2.2 has the following enhancements:

Jobs are queued by default

Job queueing is an optional feature that prevents job runs from being skipped when resources are unavailable for the run. Job queueing is supported in the 2.0, 2.1, and 2.2 versions of the Jobs API, with the following differences in the default handling of queueing:

For jobs created with the Jobs API 2.2, queueing is enabled by default. You can turn off queueing by setting the queue field to false in request bodies when you create or update a job.
For jobs created with the 2.0 and 2.1 versions of the Jobs API, queueing is not enabled by default. With these versions, you must enable queueing by setting the queue field to true in request bodies when you create or update a job.

You can enable or disable queueing when you create a job, partially update a job, or update all job settings.

See Job queueing.

Support for paging long task and task run lists

To support jobs with a large number of tasks or task runs, Jobs API 2.2 changes how large result sets are returned for the following requests:

List jobs: See Changes to the List jobs and List job runs requests.
List job runs: See Changes to the List jobs and List job runs requests.
Get a single job: See Get a single job.
Get a single job run: See Get a single run.

The Jobs API 2.2 changes pagination for these requests as follows:

Fields representing lists of elements such as tasks, parameters, job_clusters, or environments are limited to 100 elements per response. If more than 100 values are available, the response body includes a next_page_token field containing a token to retrieve the next page of results.
Pagination is added for the responses to the Get a single job and Get a single job run requests. Pagination for the responses to the List job and List job runs request was added with Jobs API 2.1.

The following is an example response body from a Get a single job request for a job with more than 100 tasks. To demonstrate the token-based paging functionality, this example omits most fields included in the response body:

{
  "job_id": 11223344,
  "settings": {
    "tasks": [
      {
        "task_key": "task-1"
      },
      {
        "task_key": "task-2"
      },
      {
        "task_key": "task-..."
      },
      {
        "task_key": "task-100"
      }
    ]
  },
  "next_page_token": "Z29...E="
}

To retrieve the next set of results, set the page_token query parameter in the next request to the value returned in the next_page_token field. For example, /api/2.2/jobs/get?job_id=11223344&page_token=Z29...E=.

If no more results are available, the next_page_token field is not included in the response.

The following sections provide more detail on the updates to each of the list and get requests.

Changes to the `List jobs` and `List job runs` requests

For the List jobs and List job runs requests, the has_more parameter at the root level of the response object is removed. Instead, use the existence of the next_page_token to determine if more results are available. Otherwise, the functionality to paginate results remains unchanged.

To prevent large response bodies, the top-level tasks and job_clusters arrays for each job are omitted from responses by default. To include these arrays for each job included in the response body for these requests, add the expand_tasks=true parameter to the request. When expand_tasks is enabled, a maximum of 100 elements are returned in the tasks and job_clusters arrays. If either of these arrays has more than 100 elements, a has_more field (not to be confused with the root-level has_more field that is removed) inside the job object is set to true. However, only the first 100 elements are accessible. You cannot retrieve additional tasks or clusters after the first 100 with the List jobs request. To fetch more elements, use the requests that return a single job or a single job run: Get a single job and Get a single run.

Get a single job

In Jobs API 2.2, the Get a single job request to retrieve details about a single job now supports pagination of the tasks and job_clusters fields when the size of either field exceeds 100 elements. Use the next_page_token field at the object root to determine if more results are available. The value of this field is then used as the value for the page_token query parameter in subsequent requests. Array fields with fewer than 100 elements in one page will be empty on subsequent pages.

Get a single run

In Jobs API 2.2, the Get a single run request to retrieve details about a single run now supports pagination of the tasks and job_clusters fields when the size of either field exceeds 100 elements. Use the next_page_token field at the object root to determine if more results are available. The value of this field is then used as the value for the page_token query parameter in subsequent requests. Array fields with fewer than 100 elements in one page will be empty on subsequent pages.

Jobs API 2.2 also adds the only_latest query parameter to this endpoint to enable showing only the latest run attempts in the tasks array. When the only_latest parameter is true, any runs superseded by a retry or a repair are omitted from the response.

When the run_id refers to a ForEach task run, a field named iterations is present in the response. The iterations field is an array containing details for all runs of the ForEach task’s nested task and has the following properties:

The schema of each object in the iterations array is the same as that of the objects in the tasks array.
If the only_latest query parameter is set to true, only the latest run attempts are included in the iterations array.
Pagination is applied to the iterations array instead of the tasks array.
The tasks array is still included in the response and includes the ForEach task run.

To learn more about the ForEach task, see the ForEach task documentation.

For example, see the following response for a ForEach task with some fields omitted:

{
  "job_id": 53,
  "run_id": 759600,
  "number_in_job": 7,
  "original_attempt_run_id": 759600,
  "state": {
    "life_cycle_state": "TERMINATED",
    "result_state": "SUCCESS",
    "state_message": ""
  },
  "cluster_spec": {},
  "start_time": 1595943854860,
  "setup_duration": 0,
  "execution_duration": 0,
  "cleanup_duration": 0,
  "trigger": "ONE_TIME",
  "creator_user_name": "user@databricks.com",
  "run_name": "process_all_numbers",
  "run_type": "JOB_RUN",
  "tasks": [
    {
      "run_id": 759600,
      "task_key": "process_all_numbers",
      "description": "Process all numbers",
      "for_each_task": {
        "inputs": "[ 1, 2, ..., 101 ]",
        "concurrency": 10,
        "task": {
          "task_key": "process_number_iteration"
          "notebook_task": {
            "notebook_path": "/Users/user@databricks.com/process_single_number",
            "base_parameters": {
              "number": "{{input}}"
            }
          }
        },
        "stats": {
          "task_run_stats": {
            "total_iterations": 101,
            "scheduled_iterations": 101,
            "active_iterations": 0,
            "failed_iterations": 0,
            "succeeded_iterations": 101,
            "completed_iterations": 101
          }
        }
      }
      "state": {
        "life_cycle_state": "TERMINATED",
        "result_state": "SUCCESS",
        "state_message": ""
      }
    }
  ],
  "iterations": [
    {
      "run_id": 759601,
      "task_key": "process_number_iteration",
      "notebook_task": {
        "notebook_path": "/Users/user@databricks.com/process_single_number",
        "base_parameters": {
          "number": "{{input}}"
        }
      },
      "state": {
        "life_cycle_state": "TERMINATED",
        "result_state": "SUCCESS",
        "state_message": ""
      }
    },
    {
      "run_id": 759602,
      "task_key": "process_number_iteration",
      "notebook_task": {
        "notebook_path": "/Users/user@databricks.com/process_single_number",
        "base_parameters": {
          "number": "{{input}}"
        }
      },
      "state": {
        "life_cycle_state": "TERMINATED",
        "result_state": "SUCCESS",
        "state_message": ""
      }
    }
  ],
  "format": "MULTI_TASK",
  "next_page_token": "eyJ..x9"
}

Jaa

Updating from Jobs API 2.1 to 2.2

Jobs are queued by default

Support for paging long task and task run lists

Changes to the `List jobs` and `List job runs` requests

Get a single job

Get a single run

Palaute

Lisäresursseja

Jaa

Updating from Jobs API 2.1 to 2.2

Jobs are queued by default

Support for paging long task and task run lists

Changes to the List jobs and List job runs requests

Get a single job

Get a single run

Palaute

Lisäresursseja

Changes to the `List jobs` and `List job runs` requests