Supporting Large Artifacts in the Model Catalog
The maximum size of a model artifact is 6 GB.
The Console options for uploading models only supports model artifacts up to 100 MB in size. To upload large model artifacts, all the following examples use Python and the ADS. Large model artifacts are supported by copying an artifact from an Object Storage bucket to the service bucket of the model catalog.
Preliminary Steps for Using ADS
First, create some utility methods for the example to work:
import os
import oci
import random
import warnings
import numpy as np
import ads
from ads.catalog.model import ModelCatalog
from ads.common.model_metadata import UseCaseType
from ads.model.generic_model import GenericModel
from numpy import array
from numpy import ndarray
from shutil import rmtree
ads.set_auth("resource_principal")
warnings.filterwarnings('ignore')
# ads.set_auth("api_key")
# ads.set_debug_mode(False)
# auth = {"config": oci.config.from_file(os.path.join("~/.oci", "config"))}
from sklearn.datasets import make_classification
import pandas as pd
import os
class Size:
MB_20 = 6000
MB_200 = 60000
MB_2000 = 600000
def generate_large_csv(size: Size = Size.MB_20, file_path: str = "./large_csv_file.csv"):
X_big, y_big = make_classification(n_samples=size, n_features=200)
df_big = pd.concat([pd.DataFrame(X_big), pd.DataFrame(y_big)], axis=0)
df_big.to_csv(os.path.join(file_path))
Next, create a sample model to use in this example and populate the model with a large-scale CSV file. This example uses a 20 MB file though files up to 6 GB work.
class Square:
def predict(self, x):
x_array = np.array(x)
return np.ndarray.tolist(x_array * x_array)
model = Square()
artifact_dir = "./large_artifact/"
generic_model = GenericModel(
estimator=model,
artifact_dir=artifact_dir
)
generic_model.prepare(
inference_conda_env="dataexpl_p37_cpu_v3",
training_conda_env="dataexpl_p37_cpu_v3",
use_case_type=UseCaseType.MULTINOMIAL_CLASSIFICATION,
X_sample=X,
y_sample=array(X) ** 2,
force_overwrite=True
)
generate_large_csv(Size.MB_20, file_path=os.path.join(artifact_dir, "large_csv_file.csv"))
Saving a Large Model to the Model Catalog
You must have an Object Storage bucket to support models larger than 2 GB. You can create a bucket in the Console or using the OCI API.
Create an Object Storage bucket from the Console:
- Sign in to the Console.
- Open the navigation menu and select Storage. Under Object Storage & Archive Storage, select Buckets.
- Under List Scope, select a Compartment.
- Select Create Bucket.
Enter the following form information in the Create Bucket form.
- Bucket Name: Enter a-bucket-name.
- Default Storage Tier: Select
Standard.
Don't select the following options:
- Enable Auto-Tiering
- Enable Object Versioning
- Emit Object Events
- Uncommitted Multipart Uploads Cleanup
- Encryption: Select Encrypt using Oracle managed keys
- Select Create. The bucket is created.
Construct the Bucket URI
The bucket URI isn't listed in the bucket details page in the Console so you must create the URI yourself. Create the bucket URI:
-
Use the following template to create the bucket URI:
oci://<bucket_name>@<namespace>/<objects_folder>/
.Replace the bucket name with the one you created. For namespace, use a tenancy name (for example:
my-tenancy
). For object folder usemy-object-folder
.With the provided data, the
bucket_uri
would be:oci://my-bucket-name@my-tenancy/my-object-folder/
. - To upload large model artifacts, you must add two extra parameters to the
GenericModel.save(...)
method:bucket_uri: (str, optional)
Defaults to None.The Object Storage URI where model artifact is temporarily copied to.
The
bucket_uri
is only necessary for uploading large artifacts when the size is greater than 2 GB. However, you can also use the method with small artifacts as well. For example:oci://<bucket_name>@<namespace>/prefix/
.remove_existing_artifact: (bool, optional)
Defaults toTrue
.The method decides whether artifacts uploaded to the Object Storage bucket should be removed.
- Take the model artifact and copy it from a notebook session to the
bucket_uri
. - Next, copy the artifact from the bucket (
bucket_uri
) to the service bucket.If the artifact size greater than 2 GB and
bucket_uri
isn't provided, an error occurs.By default, the
remove_existing_artifact
attribute is set toTrue
. The artifact is automatically removed from the bucket (bucket_uri
) after a successful upload to the service bucket. If you don't want to remove artifact from the bucket, set:remove_existing_artifact = False
.
To summarize, the process is:
- Prepare model artifacts.
- Save base information about the model to the model catalog.
- Upload model artifacts to a Object Storage bucket (
bucket_uri
). - Upload model artifacts from a bucket to the model catalog service bucket.
- Remove temporary artifacts from a bucket based on the
remove_existing_artifact
parameter:large_model_id = generic_model.save( display_name='Generic Model With Large Artifact', bucket_uri=<provide bucket uri>, remove_existing_artifact=True )
Loading a Large Model to the Model Catalog
To load models larger than 2 GB, add two extra parameters to the GenericModel.from_model_catalog(...)
method:
bucket_uri: (str, optional)
Defaults to None.The Object Storage URI where model artifacts are temporarily copied to. The
bucket_uri
is only necessary for downloading large artifacts greater in size than 2 GB. The method works with the small artifacts as well. Example:oci://<bucket_name>@<namespace>/prefix/
.remove_existing_artifact: (bool, optional)
Defaults to `True`.The method decides whether artifacts uploaded to the Object Storage bucket should be removed.
To summarize, the process is:
- Download the model artifacts from the model catalog service Object Storage bucket to a bucket.
- Download the model artifacts from the bucket to the notebook session.
- Remove the temporary artifacts from the bucket based on the
remove_existing_artifact
parameter. -
Load the base information about the model from the model catalog:
large_model = GenericModel.from_model_catalog( large_model_id, "model.pkl", "./downloaded_large_artifact/", bucket_uri=<provide bucket uri> , force_overwrite=True, remove_existing_artifact=True )