Cohere Rerank 3.5
The cohere.rerank.v3-5 model takes in a query and a list of texts and produces an ordered array with each text assigned a relevance score. The relevance score is how the model ranks the documents, that's, how well each text matches the query.
Regions for this Model
For supported regions, endpoint types (on-demand or dedicated AI clusters), and hosting (OCI Generative AI or external calls) for this model, see the Models by Region page. For details about the regions, see the Generative AI Regions page.
Access this Model
The API endpoints for all supported commercial, sovereign, and government regions are listed in the Management API and Inference API links. You can access each model only through its supported regions.
Key Features
- Dedicated mode only.
- Not available on-demand or in the playground.
- Access the model that's hosted on a cluster through API and SDK.
- For dedicated mode, create an endpoint on a hosting dedicated AI cluster, host the model on the cluster, and then run the RerankText API or its relevant SDK.
Dedicated AI Cluster for the Model
To reach a model through a dedicated AI cluster in any listed region, you must create an endpoint for that model on a dedicated AI cluster. For the cluster unit size that matches this model, see the following table.
| Base Model | Fine-Tuning Cluster | Hosting Cluster | Pricing Page Information | Request Cluster Limit Increase |
|---|---|---|---|---|
|
Not available for fine-tuning |
|
|
|
If you don't have enough cluster limits in your tenancy for hosting the Cohere Rerank 3.5 model on a dedicated AI cluster, request the dedicated-unit-rerank-cohere-count limit to increase by 1.
Endpoint Rules for Clusters
- A dedicated AI cluster can hold up to 50 endpoints.
- Use these endpoints to create aliases that all point either to the same base model or to the same version of a custom model, but not both types.
- Several endpoints for the same model make it easy to assign them to different users or purposes.
| Hosting Cluster Unit Size | Endpoint Rules |
|---|---|
| RERANK_COHERE |
|
-
To increase the call volume supported by a hosting cluster, increase its instance count by editing the dedicated AI cluster. See Updating a Dedicated AI Cluster.
-
For more than 50 endpoints per cluster, request an increase for the limit,
endpoint-per-dedicated-unit-count. See Requesting a Service Limit Increase and Service Limits for Generative AI.
Cluster Performance Benchmarks
Review the Cohere Rerank 3.5 cluster performance benchmarks for different scenarios.
Release and Retirement Dates
| Model | Release Date | On-Demand Retirement Date | Dedicated Mode Retirement Date |
|---|---|---|---|
cohere.rerank.v3-5
|
2025-05-14 | On-demand mode isn't available for this model. | At least 6 months after the release of the 1st replacement model. |
Rerank Model Parameter
For the Rerank model parameters, see the RerankText API documentation.