Creating a Pipeline
Create a Data Science pipeline to run a task. You can create pipelines by using the ADS SDK, OCI Console, or the OCI SDK. Using ADS for creating pipelines can make developing the pipeline, the steps, and the dependencies easier. ADS supports reading and writing the pipeline to and from a YAML file. You can use ADS to view a visual representation of the pipeline. We recommend that you use ADS to create and manage pipeline using code.
Ensure that you have created the necessary policies, authentication, and authorization for pipelines.
For proper operation of script steps, ensure that you have added the following rule to a dynamic group policy:
all {resource.type='datasciencepipelinerun', resource.compartment.id='<pipeline-run-compartment-ocid>'}Before You Begin
- Create a step artifact file.
- Review the use of pipelines environment variables.
- To store and manage pipeline logs, learn about logging.
- Decide which conda environment you want to use. If you need a custom conda environment, create and publish one.
- On the Projects list page, select the project that contains the pipelines that you want to work with. If you need help finding the list page or the project, see Listing Projects.
- On the project details page, select Pipelines.
- Select Create pipeline.
-
On the Create pipeline page, enter the following information.
- Compartment: Select the compartment to store the pipeline in.
- Name (Optional): Enter a name for the pipeline (limit of 255 characters). If you don't provide a name, a name is automatically generated. Example:
pipeline2022808222435 - Description (Optional): Enter a description for the pipeline.
- Pipeline steps: For each pipeline step that you want to add to the pipeline, select Add pipeline steps to open the Add pipeline step panel and then follow the procedure for the type of pipeline step that you want.
Job: To create a pipeline step from a job, select From jobs and enter the following information.
Note
Optionally create a default pipeline configuration that's used when the pipeline is run by entering environment variable, command line arguments, and maximum runtime options.- Step name: Enter a unique name for the step. You can't repeat a step name in a pipeline.
- Step description (Optional): Enter a step description, which can help you find step dependencies.
- Step run name
- Depends on (Optional): If this step depends on another step, select one or more steps to run before this step.
- Select a job compartment: Select the compartment containing the job that you want to use as a pipeline step.
- Select a job: Select the job that you want to use as a pipeline step.
- Parameters (Optional):Note
The step needs to ensure that the specified file (for example,/home/datascience/output.json) is populated with a valid JSON defining the specified variables. For example:{ "message":"Hello John!", "ocpu": 2, "memory": 10 }- Custom environment variable key (Optional): The environment variables for this pipeline step.
- Value (Optional): The key's value.
- Command line arguments (Optional): Enter the command line arguments that you want to use for running the pipeline step.
- Maximum runtime (in minutes) (Optional): The maximum number of minutes that the pipeline step is allowed to run. The service cancels the pipeline run if its runtime exceeds the specified value. The maximum runtime is 30 days (43,200 minutes). We recommend that you configure a maximum runtime on all pipeline runs to prevent runaway pipeline runs.
- Output parameters (Optional):
- Output parameter type: Select
JSON. - Parameter name: Enter a parameter name.
- Output file name: Select the output file name in which the step stores the output parameters. For example:
/home/datascience/output.json.
- Output parameter type: Select
- Save: Select to save the step.
The Create pipeline page reopens with the step added.
Script: To create a pipeline step from a script, select From script and enter the following information.
- Step name: Enter a unique name for the step. You can't repeat a step name in a pipeline.
- Step description (Optional): Enter a step description, which can help you find step dependencies.
- Depends on (Optional): If this step depends on another step, select one or more steps to run before this step.
- Upload job artifact: Drag a job step file into the box, or select the box to navigate to the file for selection.
- Entrypoint (Optional): Select a file to be the entry run point of the step. This is useful when you have many files.
- Parameters (Optional):Note
The step needs to ensure that the specified file (for example,/home/datascience/output.json) is populated with a valid JSON defining the specified variables. For example:{ "message":"Hello John!", "ocpu": 2, "memory": 10 }- Custom environment variable key (Optional): The environment variables for this pipeline step.
- Value (Optional): The key's value.
- Command line arguments (Optional): Enter the command line arguments that you want to use for running the pipeline step.
- Maximum runtime (in minutes) (Optional): The maximum number of minutes that the pipeline step is allowed to run. The service cancels the pipeline run if its runtime exceeds the specified value. The maximum runtime is 30 days (43,200 minutes). We recommend that you configure a maximum runtime on all pipeline runs to prevent runaway pipeline runs.
- Output parameters (Optional):
- Output parameter type: Select
JSON. - Parameter name: Enter a parameter name.
- Output file name: Select the output file name in which the step stores the output parameters. For example:
/home/datascience/output.json.
- Output parameter type: Select
- Change the Compute shape by selecting Change shape. Then, follow these steps in the Select compute shape panel.Note
For the AMD shape, you can use the default or set the number of OCPUs and memory.- Select an instance type.
- Select a shape series.
- Select one of the supported Compute shapes in the series. Select the shape that best suits how you want to use the resource.
- Expand the selected shape to configure OCPUs and memory.
- Number of OCPUs
- Amount of memory (GB): For each OCPU, select up to 64 GB of memory and a maximum total of 512 GB. The minimum amount of memory allowed is either 1 GB or a value matching the number of OCPUs, whichever is greater.
- Enable Burstable Shape: Select if using burstable VMs, and then for Baseline utilization per OCPU, select the percentage of OCPUs that you usually want to use. The supported values are 12.5% and 50%. (For model deployments, only the value of 50% is supported.)
- Select Select shape.
- Compute shape parameterized
- Shape parameterized
- Ocpus parameterized
- MemoryInGBs parameterized
- Block Storage: Enter the amount of storage that you want to use between 50 GB and 10, 240 GB (10 TB). You can change the value by 1 GB increments. The default value is 100 GB.
- Networking resources: Select the relevant option.
- Default Networking: Restricts traffic to Oracle services only. The system uses the existing service-managed network. The workload is attached by using a secondary VNIC to a preconfigured, service-managed VCN, and subnet. This provided subnet lets egress to the public internet through a NAT gateway, and access to other Oracle Cloud services through a service gateway.
If you need access only to the public internet and OCI services, we recommend using this option. It doesn't require you to create networking resources or write policies for networking permissions.
- Default networking with internet: Allows outbound internet access through the Data Science NAT gateway.Note
You can't use Default networking with internet in disconnected realms and Oracle development tenancies. If your tenancy or compartment has a Data Science security zone policy that denies public network access (for example,deny model_deploy_public_network—see Data Science security zone policy), the service-managed public internet access option is disabled. If you try to use this option, you receive a404NotAuthorizedOrNotFounderror. - Custom Networking: Select the VCN and subnet (by compartment) that you want to use.
For egress access to the public internet, use a private subnet with a route to a NAT gateway.
Note
- Custom networking must be used to use a file storage mount.
- Switching from custom networking to managed networking isn't supported after creation.
- If you see the banner
The specified subnet is not accessible. Select a different subnet., then create a policy that allows Data Science to use custom networking. See Policies.
- Default Networking: Restricts traffic to Oracle services only. The system uses the existing service-managed network. The workload is attached by using a secondary VNIC to a preconfigured, service-managed VCN, and subnet. This provided subnet lets egress to the public internet through a NAT gateway, and access to other Oracle Cloud services through a service gateway.
- Storage mounts (Optional):
- File storage mounts (Optional): Select Add file storage mount and enter the following information.
- Compartment: Select the compartment that contains the target that you want to mount.
- Mount target: The mount target that you want to use.
- Export path: The export path that you want to use.
- Destination path and directory: Enter the path to use for mounting the storage.
The path must start with an alphanumeric character. The destination directory must be unique across the storage mounts provided. The allowed characters are alphanumerics, hyphen ( - ) and underscore ( _ ).
You can specify the full path, such as
/opc/storage-directory. If only a directory is specified, such as/storage-directory, then it's mounted under the default/mntdirectory. You can't specify OS specific directories, such as/binor/etc.
- Object storage mounts (Optional): Select Add object storage mount and enter the following information.
- Compartment: Select the compartment that contains the bucket that you want to mount.
- Bucket: Select the bucket that you want to use.
- Object name prefix (Optional): Object name prefix. The prefix must start with an alphanumeric character. The allowed characters are alphanumerics, slash ( / ), hyphen ( - ) and underscore ( _ ).
- Destination path and directory: Enter the path to use for mounting the storage.
The path must start with an alphanumeric character. The destination directory must be unique across the storage mounts provided. The allowed characters are alphanumerics, hyphen ( - ) and underscore ( _ ).
You can specify the full path, such as
/opc/storage-directory. If only a directory is specified, such as/storage-directory, then it's mounted under the default/mntdirectory. You can't specify OS specific directories, such as/binor/etc.
Note
If using custom networking:- Create the service gateway in the VCN.
- For the route table configurations in the private subnet, add the service gateway.
- Change the egress rules of security list of the required subnet to let traffic to all services in the network.
- File storage mounts (Optional): Select Add file storage mount and enter the following information.
- Save: Select to save the step.
The Create pipeline page reopens with the step added.
Container: To create a pipeline step from a container, select From container and enter the following information.
Optionally, when defining pipeline steps, you can select to use Bring Your Own Container. For more information, see Bring Your Own Container (BYOC) for Pipelines.
- Step name: Enter a unique name for the step. You can't repeat a step name in a pipeline.
- Step description (Optional): Enter a step description, which can help you find step dependencies.
- Depends on (Optional): If this step depends on another step, select one or more steps to run before this step.
- Configure container environment: Select Configure to open the Configure container environment panel and then enter the following information.
- Repository compartment
- Repository
- Image
- Entrypoint
- CMD: Use CMD as arguments to the ENTRYPOINT or the only command to run in the absence of an ENTRYPOINT.
- Image digest (Optional)
- Signature ID (Optional): If using signature verification, enter the OCID of the image signature. Example:
ocid1.containerimagesignature.oc1.iad.aaaaaaaaab....
- Configure container environment: Select Configure to open the Configure container environment panel and then enter the following information.
- Repository compartment
- Repository
- Image
- Entrypoint (Optional)
- CMD (Optional): Use CMD as arguments to the ENTRYPOINT or the only command to run in the absence of an ENTRYPOINT.
- Image digest (Optional)
- Signature ID (Optional): If using signature verification, enter the OCID of the image signature. Example:
ocid1.containerimagesignature.oc1.iad.aaaaaaaaab....
- Upload job artifact: Drag a step artifact into the box, or select the box to navigate to the file for selection.
This step is optional only if BYOC is configured.
- Parameters (Optional):Note
The step needs to ensure that the specified file (for example,/home/datascience/output.json) is populated with a valid JSON defining the specified variables. For example:{ "message":"Hello John!", "ocpu": 2, "memory": 10 }- Custom environment variable key (Optional): The environment variables for this pipeline step.
- Value (Optional): The key's value.
- Command line arguments (Optional): Enter the command line arguments that you want to use for running the pipeline step.
- Maximum runtime (in minutes) (Optional): The maximum number of minutes that the pipeline step is allowed to run. The service cancels the pipeline run if its runtime exceeds the specified value. The maximum runtime is 30 days (43,200 minutes). We recommend that you configure a maximum runtime on all pipeline runs to prevent runaway pipeline runs.
- Output parameters (Optional):
- Output parameter type: Select
JSON. - Parameter name: Enter a parameter name.
- Output file name: Select the output file name in which the step stores the output parameters. For example:
/home/datascience/output.json.
- Output parameter type: Select
- Change the Compute shape by selecting Change shape. Then, follow these steps in the Select compute shape panel.Note
For the AMD shape, you can use the default or set the number of OCPUs and memory.- Select an instance type.
- Select a shape series.
- Select one of the supported Compute shapes in the series. Select the shape that best suits how you want to use the resource.
- Expand the selected shape to configure OCPUs and memory.
- Number of OCPUs
- Amount of memory (GB): For each OCPU, select up to 64 GB of memory and a maximum total of 512 GB. The minimum amount of memory allowed is either 1 GB or a value matching the number of OCPUs, whichever is greater.
- Enable Burstable Shape: Select if using burstable VMs, and then for Baseline utilization per OCPU, select the percentage of OCPUs that you usually want to use. The supported values are 12.5% and 50%. (For model deployments, only the value of 50% is supported.)
- Select Select shape.
- Compute shape parameterized
- Shape parameterized
- Ocpus parameterized
- MemoryInGBs parameterized
- Block Storage: Enter the amount of storage that you want to use between 50 GB and 10, 240 GB (10 TB). You can change the value by 1 GB increments. The default value is 100 GB.
- Networking resources: Select the relevant option.
- Default Networking: Restricts traffic to Oracle services only. The system uses the existing service-managed network. The workload is attached by using a secondary VNIC to a preconfigured, service-managed VCN, and subnet. This provided subnet lets egress to the public internet through a NAT gateway, and access to other Oracle Cloud services through a service gateway.
If you need access only to the public internet and OCI services, we recommend using this option. It doesn't require you to create networking resources or write policies for networking permissions.
- Default networking with internet: Allows outbound internet access through the Data Science NAT gateway.Note
You can't use Default networking with internet in disconnected realms and Oracle development tenancies. If your tenancy or compartment has a Data Science security zone policy that denies public network access (for example,deny model_deploy_public_network—see Data Science security zone policy), the service-managed public internet access option is disabled. If you try to use this option, you receive a404NotAuthorizedOrNotFounderror. - Custom Networking: Select the VCN and subnet (by compartment) that you want to use.
For egress access to the public internet, use a private subnet with a route to a NAT gateway.
Note
- Custom networking must be used to use a file storage mount.
- Switching from custom networking to managed networking isn't supported after creation.
- If you see the banner
The specified subnet is not accessible. Select a different subnet., then create a policy that allows Data Science to use custom networking. See Policies.
- Default Networking: Restricts traffic to Oracle services only. The system uses the existing service-managed network. The workload is attached by using a secondary VNIC to a preconfigured, service-managed VCN, and subnet. This provided subnet lets egress to the public internet through a NAT gateway, and access to other Oracle Cloud services through a service gateway.
- Storage mounts (Optional):
- File storage mounts (Optional): Select Add file storage mount and enter the following information.
- Compartment: Select the compartment that contains the target that you want to mount.
- Mount target: The mount target that you want to use.
- Export path: The export path that you want to use.
- Destination path and directory: Enter the path to use for mounting the storage.
The path must start with an alphanumeric character. The destination directory must be unique across the storage mounts provided. The allowed characters are alphanumerics, hyphen ( - ) and underscore ( _ ).
You can specify the full path, such as
/opc/storage-directory. If only a directory is specified, such as/storage-directory, then it's mounted under the default/mntdirectory. You can't specify OS specific directories, such as/binor/etc.
- Object storage mounts (Optional): Select Add object storage mount and enter the following information.
- Compartment: Select the compartment that contains the bucket that you want to mount.
- Bucket: Select the bucket that you want to use.
- Object name prefix (Optional): Object name prefix. The prefix must start with an alphanumeric character. The allowed characters are alphanumerics, slash ( / ), hyphen ( - ) and underscore ( _ ).
- Destination path and directory: Enter the path to use for mounting the storage.
The path must start with an alphanumeric character. The destination directory must be unique across the storage mounts provided. The allowed characters are alphanumerics, hyphen ( - ) and underscore ( _ ).
You can specify the full path, such as
/opc/storage-directory. If only a directory is specified, such as/storage-directory, then it's mounted under the default/mntdirectory. You can't specify OS specific directories, such as/binor/etc.
Note
If using custom networking:- Create the service gateway in the VCN.
- For the route table configurations in the private subnet, add the service gateway.
- Change the egress rules of security list of the required subnet to let traffic to all services in the network.
- File storage mounts (Optional): Select Add file storage mount and enter the following information.
- Save: Select to save the step.
The Create pipeline page reopens with the step added.
Data Flow application: To create a pipeline step from a Data Flow application, select From Data Flow applications and enter the following information.
- Step name: Enter a unique name for the step. You can't repeat a step name in a pipeline.
- Step description (Optional): Enter a step description, which can help you find step dependencies.
- Depends on (Optional): If this step depends on another step, select one or more steps to run before this step.
- Select a dataflow application compartment
- Select a dataflow application
- Parameters (Optional):Note
The step needs to ensure that the specified file (for example,/home/datascience/output.json) is populated with a valid JSON defining the specified variables. For example:{ "message":"Hello John!", "ocpu": 2, "memory": 10 }- Custom environment variable key (Optional): The environment variables for this pipeline step.
- Value (Optional): The key's value.
- Command line arguments (Optional): Enter the command line arguments that you want to use for running the pipeline step.
- Maximum runtime (in minutes) (Optional): The maximum number of minutes that the pipeline step is allowed to run. The service cancels the pipeline run if its runtime exceeds the specified value. The maximum runtime is 30 days (43,200 minutes). We recommend that you configure a maximum runtime on all pipeline runs to prevent runaway pipeline runs.
- Data Flow configuration: Select Configure to open the Configure Data Flow configuration panel and then enter the following information.
- Driver shape
- Driver OCPUs
- Driver Memory (GB)
- Executor shape
- Executor OCPUs
- Executor Memory (GB)
- Number of executors
- Enter the bucket path manually
- Logs bucket URI
- Object storage bucket name compartment
- Object storage bucket name
- Key
- Value
- Warehouse bucket URI
- Configure: Select to save entered information and go back to the Add pipeline step page.
- Save: Select to save the step.
The Create pipeline page reopens with the step added.
- Parameters (Optional):Note
The step needs to ensure that the specified file (for example,/home/datascience/output.json) is populated with a valid JSON defining the specified variables. For example:{ "message":"Hello John!", "ocpu": 2, "memory": 10 }- Custom environment variable key (Optional): The environment variables for this pipeline step.
- Value (Optional): The key's value.
- Command line arguments (Optional): Enter the command line arguments that you want to use for running the pipeline step.
- Maximum runtime (in minutes) (Optional): The maximum number of minutes that the pipeline step is allowed to run. The service cancels the pipeline run if its runtime exceeds the specified value. The maximum runtime is 30 days (43,200 minutes). We recommend that you configure a maximum runtime on all pipeline runs to prevent runaway pipeline runs.
- Custom parameter key
- Value
- Change the Compute shape by selecting Change shape. Then, follow these steps in the Select compute shape panel.Note
For the AMD shape, you can use the default or set the number of OCPUs and memory.- Select an instance type.
- Select a shape series.
- Select one of the supported Compute shapes in the series. Select the shape that best suits how you want to use the resource.
- Expand the selected shape to configure OCPUs and memory.
- Number of OCPUs
- Amount of memory (GB): For each OCPU, select up to 64 GB of memory and a maximum total of 512 GB. The minimum amount of memory allowed is either 1 GB or a value matching the number of OCPUs, whichever is greater.
- Enable Burstable Shape: Select if using burstable VMs, and then for Baseline utilization per OCPU, select the percentage of OCPUs that you usually want to use. The supported values are 12.5% and 50%. (For model deployments, only the value of 50% is supported.)
- Select Select shape.
- Compute shape parameterized
- Shape parameterized
- Ocpus parameterized
- MemoryInGBs parameterized
- Block Storage: Enter the amount of storage that you want to use between 50 GB and 10, 240 GB (10 TB). You can change the value by 1 GB increments. The default value is 100 GB.
- Networking resources: Select the relevant option.
- Default Networking: Restricts traffic to Oracle services only. The system uses the existing service-managed network. The workload is attached by using a secondary VNIC to a preconfigured, service-managed VCN, and subnet. This provided subnet lets egress to the public internet through a NAT gateway, and access to other Oracle Cloud services through a service gateway.
If you need access only to the public internet and OCI services, we recommend using this option. It doesn't require you to create networking resources or write policies for networking permissions.
- Default networking with internet: Allows outbound internet access through the Data Science NAT gateway.Note
You can't use Default networking with internet in disconnected realms and Oracle development tenancies. If your tenancy or compartment has a Data Science security zone policy that denies public network access (for example,deny model_deploy_public_network—see Data Science security zone policy), the service-managed public internet access option is disabled. If you try to use this option, you receive a404NotAuthorizedOrNotFounderror. - Custom Networking: Select the VCN and subnet (by compartment) that you want to use.
For egress access to the public internet, use a private subnet with a route to a NAT gateway.
Note
- Custom networking must be used to use a file storage mount.
- Switching from custom networking to managed networking isn't supported after creation.
- If you see the banner
The specified subnet is not accessible. Select a different subnet., then create a policy that allows Data Science to use custom networking. See Policies.
- Default Networking: Restricts traffic to Oracle services only. The system uses the existing service-managed network. The workload is attached by using a secondary VNIC to a preconfigured, service-managed VCN, and subnet. This provided subnet lets egress to the public internet through a NAT gateway, and access to other Oracle Cloud services through a service gateway.
- Enable logging (Optional): Log messages.
- Log group compartment: Select the compartment that contains the log group.
- Log group: Select the log group.
- Storage mounts (Optional):
- File storage mounts (Optional): Select Add file storage mount and enter the following information.
- Compartment: Select the compartment that contains the target that you want to mount.
- Mount target: The mount target that you want to use.
- Export path: The export path that you want to use.
- Destination path and directory: Enter the path to use for mounting the storage.
The path must start with an alphanumeric character. The destination directory must be unique across the storage mounts provided. The allowed characters are alphanumerics, hyphen ( - ) and underscore ( _ ).
You can specify the full path, such as
/opc/storage-directory. If only a directory is specified, such as/storage-directory, then it's mounted under the default/mntdirectory. You can't specify OS specific directories, such as/binor/etc.
- Object storage mounts (Optional): Select Add object storage mount and enter the following information.
- Compartment: Select the compartment that contains the bucket that you want to mount.
- Bucket: Select the bucket that you want to use.
- Object name prefix (Optional): Object name prefix. The prefix must start with an alphanumeric character. The allowed characters are alphanumerics, slash ( / ), hyphen ( - ) and underscore ( _ ).
- Destination path and directory: Enter the path to use for mounting the storage.
The path must start with an alphanumeric character. The destination directory must be unique across the storage mounts provided. The allowed characters are alphanumerics, hyphen ( - ) and underscore ( _ ).
You can specify the full path, such as
/opc/storage-directory. If only a directory is specified, such as/storage-directory, then it's mounted under the default/mntdirectory. You can't specify OS specific directories, such as/binor/etc.
Note
If using custom networking:- Create the service gateway in the VCN.
- For the route table configurations in the private subnet, add the service gateway.
- Change the egress rules of security list of the required subnet to let traffic to all services in the network.
- File storage mounts (Optional): Select Add file storage mount and enter the following information.
- Tags (under Advanced options): Add tags to the pipeline. If you have permissions to create a resource, then you also have permissions to apply free-form tags to that resource. To apply a defined tag, you must have permissions to use the tag namespace. For more information about tagging, see Resource Tags. If you're not sure whether to apply tags, skip this option or ask an administrator. You can apply tags later.
-
Select Create.
After the pipeline is in an active state, you can use pipeline runs to repeatedly run the pipeline.
These environment variables control the pipeline run.
You can use the OCI CLI to create a pipeline as in this Python example:
-
Create a pipeline:
The following parameters are available to use in the payload:
Parameter name Required Description Pipeline (top level) projectIdRequired The project OCID to create the pipeline in. compartmentIdRequired The compartment OCID to the create the pipeline in. displayNameOptional The name of the pipeline. infrastructureConfigurationDetailsOptional Default infrastructure (compute) configuration to use for all the pipeline steps, see
infrastructureConfigurationDetailsfor details on the supported parameters.Can be overridden by the pipeline run configuration.
logConfigurationDetailsOptional Default log to use for the all the pipeline steps, see
logConfigurationDetailsfor details on the supported parameters.Can be overridden by the pipeline run configuration.
configurationDetailsOptional Default configuration for the pipeline run, see
configurationDetailsfor details on supported parameters.Can be overridden by the pipeline run configuration.
freeformTagsOptional Tags to add to the pipeline resource. stepDetailsstepNameRequired Name of the step. Must be unique in the pipeline. descriptionOptional Free text description for the step. stepTypeRequired CUSTOM_SCRIPTorML_JOBjobIdRequired* For ML_JOBsteps, this is the job OCID to use for the step run.stepInfrastructureConfigurationDetailsOptional* Default infrastructure (Compute) configuration to use for this step, see
infrastructureConfigurationDetailsfor details on the supported parameters.Can be overridden by the pipeline run configuration.
*Must be defined on at least one level (precedence based on priority, 1 being highest):
1 pipeline run and/or
2 step and/or
3 pipeline
stepConfigurationDetailsOptional* Default configuration for the step run, see
configurationDetailsfor details on supported parameters.Can be overridden by the pipeline run configuration.
*Must be defined on at least one level (precedence based on priority, 1 being highest):
1 pipeline run and/or
2 step and/or
3 pipeline
dependsOnOptional List of steps that must be completed before this step begins. This creates the pipeline workflow dependencies graph. infrastructureConfigurationDetailsshapeNameRequired Name of the Compute shape to use. For example, VM.Standard2.4. blockStorageSizeInGBsRequired Number of GBs to use as the attached storage for the VM. logConfigurationDetailsenableLoggingRequired Define to use logging. logGroupIdRequired Log group OCID to use for the logs. The log group must be created and available when the pipeline runs logIdOptional* Log OCID to use for the logs when not using the enableAutoLogCreationparameter.enableAutoLogCreationOptional If set to True, a log for each pipeline run is created.configurationDetailstypeRequired Only DEFAULTis supported.maximumRuntimeInMinutesOptional Time limit in minutes for the pipeline to run. environmentVariablesOptional Environment variables to provide for the pipeline step runs.
For example:
"environmentVariables": { "CONDA_ENV_TYPE": "service" }Review the list of service supported environment variables.
pipeline_payload = { "projectId": "<project_id>", "compartmentId": "<compartment_id>", "displayName": "<pipeline_name>", "pipelineInfrastructureConfigurationDetails": { "shapeName": "VM.Standard2.1", "blockStorageSizeInGBs": "50" }, "pipelineLogConfigurationDetails": { "enableLogging": True, "logGroupId": "<log_group_id>", "logId": "<log_id>" }, "pipelineDefaultConfigurationDetails": { "type": "DEFAULT", "maximumRuntimeInMinutes": 30, "environmentVariables": { "CONDA_ENV_TYPE": "service", "CONDA_ENV_SLUG": "classic_cpu" } }, "stepDetails": [ { "stepName": "preprocess", "description": "Preprocess step", "stepType": "CUSTOM_SCRIPT", "stepInfrastructureConfigurationDetails": { "shapeName": "VM.Standard2.4", "blockStorageSizeInGBs": "100" }, "stepConfigurationDetails": { "type": "DEFAULT", "maximumRuntimeInMinutes": 90 "environmentVariables": { "STEP_RUN_ENTRYPOINT": "preprocess.py", "CONDA_ENV_TYPE": "service", "CONDA_ENV_SLUG": "onnx110_p37_cpu_v1" } }, { "stepName": "postprocess", "description": "Postprocess step", "stepType": "CUSTOM_SCRIPT", "stepInfrastructureConfigurationDetails": { "shapeName": "VM.Standard2.1", "blockStorageSizeInGBs": "80" }, "stepConfigurationDetails": { "type": "DEFAULT", "maximumRuntimeInMinutes": 60 }, "dependsOn": ["preprocess"] }, ], "freeformTags": { "freeTags": "cost center" } } pipeline_res = dsc.create_pipeline(pipeline_payload) pipeline_id = pipeline_res.data.idUntil all pipeline steps artifacts are uploaded, the pipeline is in the
CREATINGstate. -
Upload a step artifact:
After an artifact is uploaded, it can't be changed.
fstream = open(<file_name>, "rb") dsc.create_step_artifact(pipeline_id, step_name, fstream, content_disposition=f"attachment; filename={<file_name>}") -
Update a pipeline:
You can only update a pipeline when it's in an
ACTIVEstate.update_pipeline_details = { "displayName": "pipeline-updated" } self.dsc.update_pipeline(<pipeline_id>, <update_pipeline_details>) -
Start pipeline run:
pipeline_run_payload = { "projectId": project_id, "displayName": "pipeline-run", "pipelineId": <pipeline_id>, "compartmentId": <compartment_id>, } dsc.create_pipeline_run(pipeline_run_payload)
-
Create a pipeline:
The ADS SDK is also a publicly available Python library that you can install with this command:
pip install oracle-adsYou can use the ADS SDK to create and run pipelines.
Creating Pipelines with Custom Networking Using APIs
You can select custom networking when creating a pipeline. Use a custom network that you've already created in the pipeline to give you extra flexibility on the network.
Provide subnet-id in the infrastructure-configuration-details to use a custom subnet on the pipeline level. For example:
"infrastructure-configuration-details": {
"block-storage-size-in-gbs": 50,
"shape-config-details": {
"memory-in-gbs": 16.0,
"ocpus": 1.0
},
"shape-name": "VM.Standard.E4.Flex",
"subnet-id": "ocid1.subnet.oc1.iad.aaaaaaaa5lzzq3fyypo6x5t5egplbfyxf2are6k6boop3vky5t4h7g35xkoa"
}Or in the step-container-configuration-details to use a custom subnet for a particular step. For example:
"step-infrastructure-configuration-details": {
"block-storage-size-in-gbs": 50,
"shape-config-details": {
"memory-in-gbs": 16.0,
"ocpus": 1.0
},
"shape-name": "VM.Standard.E4.Flex",
"subnet-id": "ocid1.subnet.oc1.iad.aaaaaaaa5lzzq3fyypo6x5t5egplbfyxf2are6k6boop3vky5t4h7g35xkoa"
},