Model Groups

Use Multimodel Serving (MMS) to deploy and manage a group of machine learning models through a construct called a model group. A model group is a resource representing collection of models in the Model Store. You can deploy and manage up to 500 models (limited by shape) in a single model deployment. Using it simplifies operations by reducing the overhead of managing several individual model deployments. This represents a significant evolution from traditional single-model deployments to a more dynamic model management and cost aware inferencing.

Key Capabilities

A model group has the following key capabilities:

  • Model Group Lifecycle Management - Model Groups support immutability and versioning, providing robust lifecycle tracking, reproducibility and safe iteration of deployments.
  • Inference Keys - You can use SaaS friendly names instead of model OCIDs for inferencing calls. Inference keys are an alias to model OCIDs.
  • Custom Metadata – list of key-value pairs passed to inference container and model specific variables in a Model Group. This feature helps several LLMs to be pinned to GPU cards.

Key Concepts

A model group is a logical resource that holds several models. When deployed, each model in a model group is identified by its model OCID and inference key (optional).

  • Supported types:
    • Homogeneous: A group of models of the same type deployed together in a shared runtime environment. These models operate independently but use the same compute and memory resources for efficient infrastructure usage.
    • Stacked: An extension of the Homogeneous Group, designed for large language models (LLMs) with a base model and several fine-tuned weights.
    • Heterogeneous: The Model Group consists of models built on different ML frameworks, such as PyTorch, TensorFlow, or ONNX. This group type lets the deployment of diverse model architectures in a single serving environment.
      Note

      Use the Bring Your Own Container (BYOC) approach to create model group deployment of this type
  • Inference keys:
    Inference keys let you use SaaS-friendly aliases instead of model OCIDs when making inference calls. An inference key acts as an alias mapped to a specific model OCID and is defined during the creation of a Model Group.
    Note

    Inference keys support a maximum length of 32 characters.

    The following are snapshots showing how to define an inference key using both the REST API and the SDK.

    • Rest API:
      
      "memberModelEntries": {
              "memberModelDetails": [
                  {
                      "inferenceKey": "key1",
                      "modelId": "ocid1.datasciencemodel.oc1.iad.aaaaaaaa4kqzxsqdmlf3x2hedpyghfpy727odfuwr3pwwhocw32wbtjuj5zq"
                  },
                  {
                      "inferenceKey": "key2",
                      "modelId": "ocid1.datasciencemodel.oc1.iad.aaaaaaaa5oyorntk2xa2swphlzqgjwmevnrentlcay7ixy5bahkuwb34xlpq"
                  },
                  {
                      "inferenceKey": "key3",
                      "modelId": "ocid1.datasciencemodel.oc1.iad.aaaaaaaatutjajr32s5uggnv3zud3ve4rya57innybhpkuam3egzmvow4zvq"
                  }
              ]
          }
    • SDK:
      member_model_details_list = [
              MemberModelDetails(
                  model_id="ocid1.datasciencemodel.oc1.iad.amaaaaaam3xyxziav7hda2c2xn57bifhvfjnb63teaxsyal4hie2uykkwrtq",
                  inference_key="key-1"),
              MemberModelDetails(
                  model_id="ocid1.datasciencemodel.oc1.iad.amaaaaaam3xyxzia4qrtrviyzlhkvaimsl6aub7nldtnzts72voejpdvmu2q",
                  inference_key="key-2"),
              MemberModelDetails(
                  model_id="ocid1.datasciencemodel.oc1.iad.amaaaaaam3xyxziaxicmn7domsjwl5ojmks3dki32ffy26prhey6tmxiwkeq",
                  inference_key="key-3")
          ]