Host Meta Llama 3.1 405B on new clusters in OCI Generative AI

OCI Generative AI has released a new FP8 quantized version of the Meta Llama 3.1 405B model, with a 50% reduced GPU footprint. You can now host the meta.llama-3.1-405b-instruct model on the new dedicated AI cluster of type Large Generic 2. This type is intended to maintain the model performance with a lower cost than its predecessor, Large Generic 4. See the performance benchmarks that were performed for the meta.llama-3.1-405b-instruct model hosted on one Large Generic 2 unit and on one Large Generic 4 unit.

To host a Meta Llama 3.1 405B model on the new Large Generic 2 cluster, follow the steps in creating a dedicated AI cluster and creating an endpoint on the cluster. For a list of offered models, see Pretrained Foundational Models in Generative AI. For information about the service, see the Generative AI documentation.