Cohere Rerank 3.5
Review performance benchmarks for the cohere.rerank.3-5
(Cohere Rerank 3.5) model hosted on one RERANK_COHERE unit of a dedicated AI cluster in OCI
Generative AI.
A rerank model takes a query and a list of texts as input and ranks the texts based on their relevancy score to the query, that's, how well each text matches the query.
To learn about reranking, we recommend that you review Best Practices for using Rerank | Cohere.
Document Size: 64 Tokens
This scenario applies to the rerank models. In this scenario, all documents are the same size, which is 64 tokens each, and the benchmarks are provided for reranking 1, 2, 4, 8, 24, 48, and 96 of these documents.
Number of Documents | Time to First Token (TTFT)(second) | Request-level Latency (second) | Request-level Throughput (Request per second) (RPS) |
---|---|---|---|
1 | 0.13 | 0.13 | 7.64 |
2 | 0.11 | 0.11 | 8.96 |
4 | 0.11 | 0.11 | 9.12 |
8 | 0.11 | 0.11 | 9.06 |
24 | 0.12 | 0.12 | 8.33 |
48 | 0.14 | 0.14 | 7.19 |
96 | 0.17 | 0.17 | 5.86 |
Document Size: 128 Tokens
This scenario applies to the rerank models. In this scenario, all documents are the same size, which is 128 tokens each, and the benchmarks are provided for reranking 1, 2, 4, 8, 24, 48, and 96 of these documents.
Number of Documents | Time to First Token (TTFT)(second) | Request-level Latency (second) | Request-level Throughput (Request per second) (RPS) |
---|---|---|---|
1 | 0.11 | 0.11 | 9.15 |
2 | 0.11 | 0.11 | 9.12 |
4 | 0.11 | 0.11 | 9.00 |
8 | 0.11 | 0.11 | 8.81 |
24 | 0.13 | 0.13 | 7.71 |
48 | 0.16 | 0.16 | 6.34 |
96 | 0.20 | 0.20 | 4.81 |
Document Size: 256 Tokens
This scenario applies to the rerank models. In this scenario, all documents are the same size, which is 256 tokens each, and the benchmarks are provided for reranking 1, 2, 4, 8, 24, 48, and 96 of these documents.
Number of Documents | Time to First Token (TTFT)(second) | Request-level Latency (second) | Request-level Throughput (Request per second) (RPS) |
---|---|---|---|
1 | 0.11 | 0.11 | 9.10 |
2 | 0.11 | 0.11 | 9.03 |
4 | 0.11 | 0.11 | 8.73 |
8 | 0.12 | 0.12 | 8.14 |
24 | 0.15 | 0.15 | 6.47 |
48 | 0.20 | 0.20 | 4.91 |
96 | 0.28 | 0.28 | 3.52 |
Document Size: 512 Tokens
This scenario applies to the rerank models. In this scenario, all documents are the same size, which is 512 tokens each, and the benchmarks are provided for reranking 1, 2, 4, 8, 24, 48, and 96 of these documents.
Number of Documents | Time to First Token (TTFT)(second) | Request-level Latency (second) | Request-level Throughput (Request per second) (RPS) |
---|---|---|---|
1 | 0.11 | 0.11 | 8.94 |
2 | 0.11 | 0.11 | 8.61 |
4 | 0.12 | 0.12 | 7.91 |
8 | 0.14 | 0.14 | 6.85 |
24 | 0.20 | 0.20 | 4.87 |
48 | 0.30 | 0.30 | 3.22 |
96 | 0.54 | 0.54 | 1.83 |
Document Size: 1024 Tokens
This scenario applies to the rerank models. In this scenario, all documents are the same size, which is 1,024 tokens each, and the benchmarks are provided for reranking 1, 2, 4, 8, 24, 48, and 96 of these documents.
Number of Documents | Time to First Token (TTFT)(second) | Request-level Latency (second) | Request-level Throughput (Request per second) (RPS) |
---|---|---|---|
1 | 0.12 | 0.12 | 8.11 |
2 | 0.13 | 0.13 | 7.22 |
4 | 0.15 | 0.15 | 6.24 |
8 | 0.19 | 0.19 | 4.99 |
24 | 0.45 | 0.45 | 2.20 |
48 | 0.73 | 0.73 | 1.34 |
96 | 1.38 | 1.38 | 0.72 |
Document Size: 2048 Tokens
This scenario applies to the rerank models. In this scenario, all documents are the same size, which is 2,048 tokens each, and the benchmarks are provided for reranking 1, 2, 4, 8, 24, 48, and 96 of these documents.
Number of Documents | Time to First Token (TTFT)(second) | Request-level Latency (second) | Request-level Throughput (Request per second) (RPS) |
---|---|---|---|
1 | 0.15 | 0.15 | 6.13 |
2 | 0.18 | 0.18 | 5.14 |
4 | 0.25 | 0.25 | 3.84 |
8 | 0.38 | 0.38 | 2.52 |
24 | 1.05 | 1.05 | 0.94 |
48 | 2.01 | 2.01 | 0.49 |
96 | 3.77 | 3.77 | 0.26 |
Document Size: 4096 Tokens
This scenario applies to the rerank models. In this scenario, all documents are the same size, which is 4,096 tokens each, and the benchmarks are provided for reranking 1, 2, 4, 8, 24, 48, and 96 of these documents.
Number of Documents | Time to First Token (TTFT)(second) | Request-level Latency (second) | Request-level Throughput (Request per second) (RPS) |
---|---|---|---|
1 | 7.35 | 7.35 | 4.65 |
2 | 7.35 | 7.35 | 3.71 |
4 | 7.35 | 7.35 | 2.43 |
8 | 7.35 | 7.35 | 1.24 |
24 | 7.35 | 7.35 | 0.49 |
48 | 7.35 | 7.35 | 0.26 |
96 | 7.35 | 7.35 | 0.14 |