Scenario 6: Lighter Embeddings Workload Benchmarks in Generative AI

The lighter embeddings scenario is similar to the text embeddings scenario 5, except that we reduce the size of each request to 16 documents, each with 512 tokens. Smaller files with fewer words could be supported by scenario 6.

Review the terms used in the hosting dedicated AI cluster benchmarks. For a list of scenarios and their descriptions, see Text Embedding Scenarios. The text embedding scenario is performed in the following region.

US Midwest (Chicago)

Model: cohere.embed-english-v3.0 hosted on one Embed Cohere unit of a dedicated AI cluster


Concurrency	Request-level Latency (second)	Request-level Throughput (Request per minute) (RPM)
1	1.19	54
8	1.41	348
32	3.47	600
128	12.08	558

Model: cohere.embed-english-light-v3.0 hosted on one Embed Cohere unit of a dedicated AI cluster


Concurrency	Request-level Latency (second)	Request-level Throughput (Request per minute) (RPM)
1	0.85	48
8	1.15	354
32	3.15	594
128	11.26	846

Model: cohere.embed-multilingual-v3.0 hosted on one Embed Cohere unit of a dedicated AI cluster


Concurrency	Request-level Latency (second)	Request-level Throughput (Request per minute) (RPM)
1	1.28	42
8	1.38	288
32	3.44	497
128	11.94	702

Model: cohere.embed-multilingual-light-v3.0 hosted on one Embed Cohere unit of a dedicated AI cluster


Concurrency	Request-level Latency (second)	Request-level Throughput (Request per minute) (RPM)
1	1.03	54
8	1.35	300
32	3.11	570
128	11.50	888

Oracle Cloud Infrastructure Documentation

Scenario 6: Lighter Embeddings Workload Benchmarks in Generative AI

US Midwest (Chicago)