Cohere Embed Multilingual Image 3
Review performance benchmarks for the cohere.embed-multilingual-image-v3.0
(Cohere Embed Multilingual Image 3) model hosted on one Embed Cohere unit of a dedicated AI cluster in OCI
Generative AI.
Text Embeddings
This scenario applies only to the embedding models. This scenario mimics embedding generation as part of the data ingestion pipeline of a vector database. In this scenario, all requests are the same size, which is 96 documents, each one with 512 tokens. An example would be a collection of large PDF files, each file with 30,000+ words that a user wants to ingest into a vector DB.
Concurrency | Request-level Latency (second) | Request-level Throughput (Request per minute) (RPM) |
---|---|---|
1 | 2.25 | 24 |
8 | 4.33 | 120 |
32 | 14.94 | 144 |
128 | 49.21 | 198 |
Lighter Text Embeddings
This scenario applies only to the embedding models. This lighter embeddings scenario is similar to the embeddings scenario, except that we reduce the size of each request to 16 documents, each with 512 tokens. Smaller files with fewer words could be supported by this scenario.
Concurrency | Request-level Latency (second) | Request-level Throughput (Request per minute) (RPM) |
---|---|---|
1 | 1.28 | 42 |
8 | 1.38 | 288 |
32 | 3.44 | 497 |
128 | 11.94 | 702 |
Image Embeddings
This scenario applies only to the embedding models with image input. In each scenario, I(M,N): Image with height Npx and width Mpx represents an image with the height of M
and the width of N
pixels. For example, I(1024,512) is an image with the height of 1,024 pixels and the width of 512 pixels.
I(512,512)
The following table shows hosting dedicated AI cluster benchmarks with the cohere.embed-multilingual-image-v3.0
hosted on one Embed Cohere unit of a dedicated AI cluster, in a scenario of an image with the height and width of 512 pixels.
Concurrency | Request-level Latency (second) | Request-level Throughput (Request per second) (RPS) |
---|---|---|
1 | 0.13 | 6.50 |
2 | 0.13 | 12.20 |
4 | 0.14 | 22.71 |
8 | 0.15 | 39.19 |
16 | 0.19 | 62.23 |
32 | 0.31 | 80.75 |
64 | 0.46 | 113.57 |
128 | 1.25 | 83.80 |
256 | 2.60 | 80.95 |
I(1024,512)
The following table shows hosting dedicated AI cluster benchmarks with the cohere.embed-multilingual-image-v3.0
hosted on one Embed Cohere unit of a dedicated AI cluster, in a scenario of an image with the height of 1,024 pixels and the width of 512 pixels.
Concurrency | Request-level Latency (second) | Request-level Throughput (Request per second) (RPS) |
---|---|---|
1 | 0.14 | 5.79 |
2 | 0.14 | 10.67 |
4 | 0.16 | 18.74 |
8 | 0.17 | 32.08 |
16 | 0.24 | 47.64 |
32 | 0.44 | 58.76 |
64 | 0.93 | 60.67 |
128 | 1.71 | 64.96 |
256 | 3.06 | 68.54 |
I(2048,2048)
The following table shows hosting dedicated AI cluster benchmarks with the cohere.embed-multilingual-image-v3.0
hosted on one Embed Cohere unit of a dedicated AI cluster, in a scenario of an image with the height and width of 2,048 pixels.
Concurrency | Request-level Latency (second) | Request-level Throughput (Request per second) (RPS) |
---|---|---|
1 | 0.26 | 2.82 |
2 | 0.30 | 4.77 |
4 | 0.29 | 10.43 |
8 | 0.34 | 18.14 |
16 | 0.57 | 21.93 |
32 | 1.09 | 25.44 |
64 | 2.08 | 26.99 |
128 | 4.14 | 26.24 |
256 | 10.17 | 23.60 |