Creating a Job
Create and run a job in Data Science.
Before You Begin
- Ensure that you have created the necessary policies, authentication, and authorization for your jobs.
- Create a job artifact file or build a custom container.
- To store and manage job logs, learn about logging.
- To use storage mounts, you must have an Object Storage bucket or OCI File Storage Service (FSS) mount target and export path.
- For storage mounts, ensure that you have the authorization to use storage mounts.
-
To use FSS, you must first create the file system and the mount point. Use the custom networking option and ensure that the mount target and the notebook are configured with the same subnet. Configure security list rules for the subnet with the specific ports and protocols.
Ensure that service limits are allocated to
file-system-countandmount-target-count.
Basic information
Select single or multiple nodes and provide identifying information.
- Job type: Select the relevant option.
- Single Node: One machine for the job.
- Multi Node: Several nodes for a demanding job.
- Compartment (Optional): Select a different compartment for the job.
- Name (Optional): Enter a name for the job (limit of 255 characters). If you don't provide a name, a name is automatically generated. Example:
job20210808222435
Configuration
Enter the following information.
- Add node group (multi-node jobs only): Select to add a node group configuration. Add up to 5 node groups.
- Name: Enter a unique name for the node group.
- Replicas: Enter the number of replicas.
- Minimum success replicas: Enter the minimum number of replicas that must succeed.
- Asterisked fields (*)
- (Optional) If you have more than one node group, then specify the node group start up order, either in parallel or in sequence.
- Custom environment variable key* (Optional): Environment variables that control the job.Note
If you uploaded zip file or compressed tar file, add theJOB_RUN_ENTRYPOINTas a custom environment variable to point to the file. - Value* (Optional): Value of the custom environment variable key.
- Command line arguments* (Optional): The command line arguments that you want to use for running the job.
- Maximum runtime (in minutes) (Optional): The maximum number of minutes that the job can run. The service cancels the job run if its runtime exceeds the specified value. The maximum runtime is 30 days (43,200 minutes). We recommend that you configure a maximum runtime on all job runs to prevent runaway job runs.
- Change shape* (Optional): Change the Compute shape by selecting Change shape. Then, follow these steps in the Select compute shape panel.
- Select an instance type.
- Select a shape series.
- Select one of the supported Compute shapes in the series. Select the shape that best suits how you want to use the resource.
- Expand the selected shape to configure OCPUs and memory.
- Number of OCPUs
- Amount of memory (GB): For each OCPU, select up to 64 GB of memory and a maximum total of 512 GB. The minimum amount of memory allowed is either 1 GB or a value matching the number of OCPUs, whichever is greater.
- Enable Burstable Shape: Select if using burstable VMs, and then for Baseline utilization per OCPU, select the percentage of OCPUs that you usually want to use. The supported values are 12.5% and 50%. (For model deployments, only the value of 50% is supported.)
- Select Select shape.
- Storage: Enter the amount of block storage to use between 50 GB and 10, 240 GB (10 TB). You can change the value by 1 GB increments.
- Networking resources: Select the relevant option.
- Default Networking: Restricts traffic to Oracle services only. The system uses the existing service-managed network. The workload is attached by using a secondary VNIC to a preconfigured, service-managed VCN, and subnet. This provided subnet lets egress to the public internet through a NAT gateway, and access to other Oracle Cloud services through a service gateway.
If you need access only to the public internet and OCI services, we recommend using this option. It doesn't require you to create networking resources or write policies for networking permissions.
- Default networking with internet: Allows outbound internet access through the Data Science NAT gateway.Note
You can't use Default networking with internet in disconnected realms and Oracle development tenancies. If your tenancy or compartment has a Data Science security zone policy that denies public network access (for example,deny model_deploy_public_network—see Data Science security zone policy), the service-managed public internet access option is disabled. If you try to use this option, you receive a404NotAuthorizedOrNotFounderror. - Custom Networking: Select the VCN and subnet (by compartment) that you want to use.
For egress access to the public internet, use a private subnet with a route to a NAT gateway.
Note
- Custom networking must be used to use a file storage mount.
- Switching from custom networking to managed networking isn't supported after creation.
- If you see the banner
The specified subnet is not accessible. Select a different subnet., then create a policy that allows Data Science to use custom networking. See Policies.
- Default Networking: Restricts traffic to Oracle services only. The system uses the existing service-managed network. The workload is attached by using a secondary VNIC to a preconfigured, service-managed VCN, and subnet. This provided subnet lets egress to the public internet through a NAT gateway, and access to other Oracle Cloud services through a service gateway.
- Upload job artifact (Optional if BYOC is configured): Upload the job artifact by dragging the required job artifact file into the box.
Note
An asterisk (*) for a field indicates different placement for multi-node jobs. If you're creating a multi-node job, then find the field by adding a node group: Under Configuration, select Add node group. The field appears in the Add node group panel.Additional configuration
Enter the following information.
- Enable logging (Optional): Configure logging.
- Log group compartment: Select the compartment that contains the log group.
- Log group: Select the log group.
- Enable automatic log creation: Select this option to automatically create a log when the job starts. The created log stores all
stdoutandstderrmessages. - Select a log: Select this option (and select an existing log) to store all
stdoutandstderrmessages.
- Enable BYOC / Environment configuration > Select* (Required for multi-node jobs): Set up an environment for Bring Your Own Container (BYOC).
- Compartment: Select the compartment that contains the repository.
- Repository: Select a repository from the list.
- Image: Select the image that you want to use.
- Entrypoint: Enter an entry point.
- CMD: Enter a command.Note
Use CMD as arguments to the ENTRYPOINT or the only command to run in the absence of an ENTRYPOINT. - Image digest: Enter an image digest.
- Signature id: If using signature verification, then enter the OCID of the image signature. Example:
ocid1.containerimagesignature.oc1.iad.aaaaaaaaab...
- File storage mounts (Optional): Select Add file storage mount and enter the following information.
- Compartment: Select the compartment that contains the target that you want to mount.
- Mount target: The mount target that you want to use.
- Export path: The export path that you want to use.
- Destination path and directory: Enter the path to use for mounting the storage.
The path must start with an alphanumeric character. The destination directory must be unique across the storage mounts provided. The allowed characters are alphanumerics, hyphen ( - ) and underscore ( _ ).
You can specify the full path, such as
/opc/storage-directory. If only a directory is specified, such as/storage-directory, then it's mounted under the default/mntdirectory. You can't specify OS specific directories, such as/binor/etc.
- Object storage mounts (Optional): Select Add object storage mount and enter the following information.
- Compartment: Select the compartment that contains the bucket that you want to mount.
- Bucket: Select the bucket that you want to use.
- Object name prefix (Optional): Object name prefix. The prefix must start with an alphanumeric character. The allowed characters are alphanumerics, slash ( / ), hyphen ( - ) and underscore ( _ ).
- Destination path and directory: Enter the path to use for mounting the storage.
The path must start with an alphanumeric character. The destination directory must be unique across the storage mounts provided. The allowed characters are alphanumerics, hyphen ( - ) and underscore ( _ ).
You can specify the full path, such as
/opc/storage-directory. If only a directory is specified, such as/storage-directory, then it's mounted under the default/mntdirectory. You can't specify OS specific directories, such as/binor/etc.
Note
If using custom networking:- Create the service gateway in the VCN.
- For the route table configurations in the private subnet, add the service gateway.
- Change the egress rules of security list of the required subnet to let traffic to all services in the network.
- Probe* (Required for multi-node jobs): Configure the startup probe.
- Select Select.
- In the Probes panel, enter the following information.
- Command
- Initial delay (in seconds)
- Period
- Failure threshold
- Select Save.
- Tags (under Advanced options): Add tags to the job. If you have permissions to create a resource, then you also have permissions to apply free-form tags to that resource. To apply a defined tag, you must have permissions to use the tag namespace. For more information about tagging, see Resource Tags. If you're not sure whether to apply tags, skip this option or ask an administrator. You can apply tags later.
Note
An asterisk (*) for a field indicates different placement for multi-node jobs. If you're creating a multi-node job, then find the field by adding a node group: Under Configuration, select Add node group. The field appears in the Add node group panel.Review and create
Review configuration and then select Create.
After the job is in an active state, you can use job runs to repeatedly run the job.
- Job type: Select the relevant option.
These environment variables control the job.
Use the Data Science CLI to create a job as in this example:
The ADS SDK is also a publicly available Python library that you can install with this command:
pip install oracle-adsIt provides the wrapper that makes the creation and running jobs from notebooks or on your client machine easy.
Use the ADS SDK to create and run jobs.