Starting a Job Run

Use Data Science job runs to apply various use cases to created jobs.

Job creation sets the infrastructure and the actual use case artifact, but the job run runs the job with the specified parameters. Job runs provision the specified infrastructure, run the job artifact, and then deprovision and destroy the used resources when the job run ends.

- If you're starting a single node job run, follow the steps in Single Node.
- If you're starting a multi node job run, follow the steps in Multi Node.
Single Node
Use these steps to start a single node job run.
On the Projects list page, select the project that contains the jobs that you want to work with. If you need help finding the list page or the project, see Listing Projects.
On the project details page, select Jobs.
Select the job.
Select Job runs.
Select Start a job run.
On the Start a job run page, enter the following information.

Compartment (Optional): Select a different compartment for the job run.

Name (Optional): Enter a name for the job run (limit of 255 characters). If you don't provide a name, a name is automatically generated. Example: jobrun20210808222435

Custom environment variable key* (Optional): Environment variables that control the job.

Value* (Optional): Value of the custom environment variable key.

Command line arguments* (Optional): The command line arguments that you want to use for running the job.

Maximum runtime (in minutes) (Optional): The maximum number of minutes that the job can run. The service cancels the job run if its runtime exceeds the specified value. The maximum runtime is 30 days (43,200 minutes). We recommend that you configure a maximum runtime on all job runs to prevent runaway job runs.

Networking resources: Select the relevant option.

Default Networking: Restricts traffic to Oracle services only. The system uses the existing service-managed network. The workload is attached by using a secondary VNIC to a preconfigured, service-managed VCN, and subnet. This provided subnet lets egress to the public internet through a NAT gateway, and access to other Oracle Cloud services through a service gateway.
If you need access only to the public internet and OCI services, we recommend using this option. It doesn't require you to create networking resources or write policies for networking permissions.

Default networking with internet: Allows outbound internet access through the Data Science NAT gateway.
Note

You can't use Default networking with internet in disconnected realms and Oracle development tenancies. If your tenancy or compartment has a Data Science security zone policy that denies public network access (for example, deny model_deploy_public_network—see Data Science security zone policy), the service-managed public internet access option is disabled. If you try to use this option, you receive a 404 NotAuthorizedOrNotFound error.

Custom Networking: Select the VCN and subnet (by compartment) that you want to use.
For egress access to the public internet, use a private subnet with a route to a NAT gateway.
Note

Custom networking must be used to use a file storage mount.

Switching from custom networking to managed networking isn't supported after creation.

If you see the banner The specified subnet is not accessible. Select a different subnet., then create a policy that allows Data Science to use custom networking. See Policies.

Change shape* (Optional): Change the Compute shape by selecting Change shape. Then, follow these steps in the Select compute shape panel.

Select an instance type.

Select a shape series.

Select one of the supported Compute shapes in the series. Select the shape that best suits how you want to use the resource.

Expand the selected shape to configure OCPUs and memory.

Number of OCPUs

Amount of memory (GB): For each OCPU, select up to 64 GB of memory and a maximum total of 512 GB. The minimum amount of memory allowed is either 1 GB or a value matching the number of OCPUs, whichever is greater.

Enable Burstable Shape: Select if using burstable VMs, and then for Baseline utilization per OCPU, select the percentage of OCPUs that you usually want to use. The supported values are 12.5% and 50%. (For model deployments, only the value of 50% is supported.)

Select Select shape.

Storage override* (Optional): Override the Storage configuration. Enter the amount of block storage to use between 50 GB and 10, 240 GB (10 TB). You can change the value by 1 GB increments.

Enable BYOC Override / Environment configuration override > Select* (Optional): Select to override the job's defined environment configuration:

Compartment: Select the compartment that contains the repository.

Repository: Select a repository from the list.

Image: Select the image that you want to use.

Entrypoint: Enter an entry point.

CMD: Enter a command.
Note

Use CMD as arguments to the ENTRYPOINT or the only command to run in the absence of an ENTRYPOINT.

Image digest: Enter an image digest.

Signature id: If using signature verification, then enter the OCID of the image signature. Example: ocid1.containerimagesignature.oc1.iad.aaaaaaaaab...

Enable logging (Optional): Override the logging configuration.

Log group compartment: Select the compartment that contains the log group.

Log group: Select the log group.

Enable automatic log creation: Select this option to automatically create a log when the job starts. The created log stores all stdout and stderr messages.

Select a log: Select this option (and select an existing log) to store all stdout and stderr messages.

Probes override* (Optional): Override the startup probe.

Select Select.

In the Probes panel, enter the following information.

Command

Initial delay (in seconds)

Period

Failure threshold

Select Save.

Tags (under Advanced options): Add tags to the job run. If you have permissions to create a resource, then you also have permissions to apply free-form tags to that resource. To apply a defined tag, you must have permissions to use the tag namespace. For more information about tagging, see Resource Tags. If you're not sure whether to apply tags, skip this option or ask an administrator. You can apply tags later.

Note

An asterisk (*) for a field indicates different placement for multi-node jobs. If you're starting a job run for a multi-node job, then find the field by editing the node group: Under Node group configuration override, from the Actions menu (three dots) for the node group, select Edit. The field appears in the resulting panel.

Select Start.
Multi Node
Use these steps to start a multi node job run.
On the Projects list page, select the project that contains the jobs that you want to work with. If you need help finding the list page or the project, see Listing Projects.
On the project details page, select Jobs.
Select the job.
Select Job runs.
Select Start a job run.
On the Start a job run page, enter the information described at Single Node.

Fields are identical to single-node job runs with the following differences.

Asterisked (*) fields appear in the panel for editing a node group.

The following fields are only available for multi-node jobs, in the panel for editing a node group.

Replicas (number of replicas)

Minimum success replicas (number of replicas that must succeed)

To open the panel for editing a node group: Under Node group configuration override, from the Actions menu (three dots) for the node group, select Edit.

Select Start.

These environment variables control the job.

Use the Data Science CLI to start job runs as in this example:

Start a job run with:

oci data-science job-run create \
--display-name <job_run_name> \
--compartment-id <compartment_ocid> \
--project-id <project_ocid> \
--job-id <job_ocid> \
--configuration-override-details file://<optional_job_run_configuration_override_json_file> \
--log-configuration-override-details file://<optional_job_run_logging_configuration_override_json_file>

(Optional) Use this job run configuration override JSON file to override the configurations defined on the parent job:

jobEnvironmentConfigurationDetails: {
  jobEnvironmentType: "OCIR_CONTAINER",
    image: "iad.ocir.io/axoxdievda5j/odsc-byod-hello-wrld:0.1.3",
    imageDigest: "sha256",
  cmd: ["ls", "-h"],
  entrypoint: ["-l"],
    imageSignatureId: "ocid1.containerimagesignature.oc1.iad.0.ociodscdev.aaaaaaaaccutw5qdz6twjzkpgmbojdck3qotqqsbn7ph6xcumu4s32o6v5gq",
},
    jobConfigurationDetails: {
        jobType: "DEFAULT",
        environmentVariables: <envar-list-object>},
    ...
}

(Optional) Use this job run logging configuration override JSON file to override the logging configuration defined on the parent job:
```
{
  "enableLogging": true,
  "enableAutoLogCreation": true,
  "logGroupId": "<log_group_ocid>"
}
```

The ADS SDK is also a publicly available Python library that you can install with this command:
```
pip install oracle-ads
```
It provides the wrapper that makes starting job runs from notebooks or on your client machine easy.

Use the ADS SDK to start job runs.

Oracle Cloud Infrastructure Documentation

Starting a Job Run

Single Node

Multi Node