Model Groups
Use Multimodel Serving (MMS) to deploy and manage a group of machine learning models through a construct called a model group. A model group is a resource representing collection of models in the Model Store. You can deploy and manage up to 500 models (limited by shape) in a single model deployment. Using it simplifies operations by reducing the overhead of managing several individual model deployments. This represents a significant evolution from traditional single-model deployments to a more dynamic model management and cost aware inferencing.
Before You Begin
To use model groups, first apply the Model Group Policies.
Tasks
You can perform the following tasks with model groups.
Key Capabilities
A model group has the following key capabilities:
- Model Group Lifecycle Management - Model Groups support immutability and versioning, providing robust lifecycle tracking, reproducibility and safe iteration of deployments.
- Inference Keys - You can use SaaS friendly names instead of model OCIDs for inferencing calls. Inference keys are an alias to model OCIDs.
- Custom Metadata – list of key-value pairs passed to inference container and model specific variables in a Model Group. This feature helps several LLMs to be pinned to GPU cards.
Key Concepts
A model group is a logical resource that holds several models. When deployed, each model in a model group is identified by its model OCID and inference key (optional).
- Supported types:
- Homogeneous: A group of models of the same type deployed together in a shared runtime environment. These models operate independently but use the same compute and memory resources for efficient infrastructure usage.
- Stacked: An extension of the Homogeneous Group, designed for large language models (LLMs) with a base model and several fine-tuned weights.
- Heterogeneous: The Model Group consists of models built on different ML frameworks, such as PyTorch, TensorFlow, or ONNX. This group type lets the deployment of diverse model architectures in a single serving environment.
- Inference keys:Inference keys let you use SaaS-friendly aliases instead of model OCIDs when making inference calls. An inference key acts as an alias mapped to a specific model OCID and is defined during the creation of a Model Group.Note
Inference keys support a maximum length of 32 characters.The following are snapshots showing how to define an inference key using both the REST API and the SDK.
- Rest API:
"memberModelEntries": { "memberModelDetails": [ { "inferenceKey": "key1", "modelId": "ocid1.datasciencemodel.oc1.iad.aaaaaaaa4kqzxsqdmlf3x2hedpyghfpy727odfuwr3pwwhocw32wbtjuj5zq" }, { "inferenceKey": "key2", "modelId": "ocid1.datasciencemodel.oc1.iad.aaaaaaaa5oyorntk2xa2swphlzqgjwmevnrentlcay7ixy5bahkuwb34xlpq" }, { "inferenceKey": "key3", "modelId": "ocid1.datasciencemodel.oc1.iad.aaaaaaaatutjajr32s5uggnv3zud3ve4rya57innybhpkuam3egzmvow4zvq" } ] } - SDK:
member_model_details_list = [ MemberModelDetails( model_id="ocid1.datasciencemodel.oc1.iad.amaaaaaam3xyxziav7hda2c2xn57bifhvfjnb63teaxsyal4hie2uykkwrtq", inference_key="key-1"), MemberModelDetails( model_id="ocid1.datasciencemodel.oc1.iad.amaaaaaam3xyxzia4qrtrviyzlhkvaimsl6aub7nldtnzts72voejpdvmu2q", inference_key="key-2"), MemberModelDetails( model_id="ocid1.datasciencemodel.oc1.iad.amaaaaaam3xyxziaxicmn7domsjwl5ojmks3dki32ffy26prhey6tmxiwkeq", inference_key="key-3") ]
- Rest API: