Cohere Rerank 3.5

Review performance benchmarks for the cohere.rerank.3-5 (Cohere Rerank 3.5) model hosted on one RERANK_COHERE unit of a dedicated AI cluster in OCI Generative AI.

A rerank model takes a query and a list of texts as input and ranks the texts based on their relevancy score to the query, that's, how well each text matches the query.

Tip

To learn about reranking, we recommend that you review Best Practices for using Rerank | Cohere.

- See the available regions for this model.
- Review the dedicated AI cluster unit size for hosting this model in the model page.
- Review the metrics.

Document Size: 64 Tokens

This scenario applies to the rerank models. In this scenario, all documents are the same size, which is 64 tokens each, and the benchmarks are provided for reranking 1, 2, 4, 8, 24, 48, and 96 of these documents.


Number of Documents	Time to First Token (TTFT)(second)	Request-level Latency (second)	Request-level Throughput (Request per second) (RPS)
1	0.13	0.13	7.64
2	0.11	0.11	8.96
4	0.11	0.11	9.12
8	0.11	0.11	9.06
24	0.12	0.12	8.33
48	0.14	0.14	7.19
96	0.17	0.17	5.86

Document Size: 128 Tokens

This scenario applies to the rerank models. In this scenario, all documents are the same size, which is 128 tokens each, and the benchmarks are provided for reranking 1, 2, 4, 8, 24, 48, and 96 of these documents.


Number of Documents	Time to First Token (TTFT)(second)	Request-level Latency (second)	Request-level Throughput (Request per second) (RPS)
1	0.11	0.11	9.15
2	0.11	0.11	9.12
4	0.11	0.11	9.00
8	0.11	0.11	8.81
24	0.13	0.13	7.71
48	0.16	0.16	6.34
96	0.20	0.20	4.81

Document Size: 256 Tokens

This scenario applies to the rerank models. In this scenario, all documents are the same size, which is 256 tokens each, and the benchmarks are provided for reranking 1, 2, 4, 8, 24, 48, and 96 of these documents.


Number of Documents	Time to First Token (TTFT)(second)	Request-level Latency (second)	Request-level Throughput (Request per second) (RPS)
1	0.11	0.11	9.10
2	0.11	0.11	9.03
4	0.11	0.11	8.73
8	0.12	0.12	8.14
24	0.15	0.15	6.47
48	0.20	0.20	4.91
96	0.28	0.28	3.52

Document Size: 512 Tokens

This scenario applies to the rerank models. In this scenario, all documents are the same size, which is 512 tokens each, and the benchmarks are provided for reranking 1, 2, 4, 8, 24, 48, and 96 of these documents.


Number of Documents	Time to First Token (TTFT)(second)	Request-level Latency (second)	Request-level Throughput (Request per second) (RPS)
1	0.11	0.11	8.94
2	0.11	0.11	8.61
4	0.12	0.12	7.91
8	0.14	0.14	6.85
24	0.20	0.20	4.87
48	0.30	0.30	3.22
96	0.54	0.54	1.83

Document Size: 1024 Tokens

This scenario applies to the rerank models. In this scenario, all documents are the same size, which is 1,024 tokens each, and the benchmarks are provided for reranking 1, 2, 4, 8, 24, 48, and 96 of these documents.


Number of Documents	Time to First Token (TTFT)(second)	Request-level Latency (second)	Request-level Throughput (Request per second) (RPS)
1	0.12	0.12	8.11
2	0.13	0.13	7.22
4	0.15	0.15	6.24
8	0.19	0.19	4.99
24	0.45	0.45	2.20
48	0.73	0.73	1.34
96	1.38	1.38	0.72

Document Size: 2048 Tokens

This scenario applies to the rerank models. In this scenario, all documents are the same size, which is 2,048 tokens each, and the benchmarks are provided for reranking 1, 2, 4, 8, 24, 48, and 96 of these documents.


Number of Documents	Time to First Token (TTFT)(second)	Request-level Latency (second)	Request-level Throughput (Request per second) (RPS)
1	0.15	0.15	6.13
2	0.18	0.18	5.14
4	0.25	0.25	3.84
8	0.38	0.38	2.52
24	1.05	1.05	0.94
48	2.01	2.01	0.49
96	3.77	3.77	0.26

Document Size: 4096 Tokens

This scenario applies to the rerank models. In this scenario, all documents are the same size, which is 4,096 tokens each, and the benchmarks are provided for reranking 1, 2, 4, 8, 24, 48, and 96 of these documents.


Number of Documents	Time to First Token (TTFT)(second)	Request-level Latency (second)	Request-level Throughput (Request per second) (RPS)
1	7.35	7.35	4.65
2	7.35	7.35	3.71
4	7.35	7.35	2.43
8	7.35	7.35	1.24
24	7.35	7.35	0.49
48	7.35	7.35	0.26
96	7.35	7.35	0.14

Oracle Cloud Infrastructure Documentation