Cohere Embed 4

Rivedere i benchmark delle prestazioni per il modello cohere.embed-v4.0 (Cohere Embed 4) ospitato su un'unità Embed Cohere di un cluster AI dedicato in OCI Generative AI.

- Vedere le aree disponibili per questo modello.
- Rivedere la dimensione dell'unità cluster AI dedicata per l'hosting di questo modello nella pagina modello.
- Esaminare le metriche.

Incorporamenti testo

Questo scenario si applica solo ai modelli di incorporamento con input di testo. Questo scenario imita la generazione incorporata come parte della pipeline di inclusione dei dati di un database vettoriale. In ogni scenario, tutte le richieste hanno la stessa dimensione, ovvero 96 documenti, ognuno con lo stesso numero di token. Ad esempio, per lo scenario di 512 token imita una raccolta di file PDF di grandi dimensioni, ogni file con oltre 30.000 parole che un utente dovrebbe includere in un database vettoriale.

64 Token

Le tabelle seguenti mostrano i benchmark per uno scenario di 96 documenti, 64 token per documento.

Il modello cohere.embed-v4.0 ospitato su un'unità Embed Cohere di un cluster AI dedicato per tutte le aree, ad eccezione dell'area Saudi Arabia Central (Riyadh).


Concorrenza	Time to First Token (TTFT)(secondo)	Latenza a livello di richiesta (secondo)	Throughput a livello di richiesta (richiesta al secondo) (RPS)	Throughput totale (token/secondo)
1	0,09	0,09	11,15	668,45
2	0,09	0,09	10,79	1.293,27
4	0,1	0,1	9,88	2.370,14
8	0,11	0,11	8,55	4.105,4
24	0,19	0,19	5,1	7.360,01
48	0,31	0,31	3,1	8.933,99
96	0,54	0,54	1,78	10.282,68

Il modello cohere.embed-v4.0 è ospitato su un'unità Embed Cohere di un cluster AI dedicato per l'area Saudi Arabia Central (Riyadh).


Concorrenza	Time to First Token (TTFT)(secondo)	Latenza a livello di richiesta (secondo)	Throughput a livello di richiesta (richiesta al secondo) (RPS)	Throughput totale (token/secondo)
1	0,1	0,1	9,5	570,59
2	0,11	0,11	9,23	1.107,06
4	0,11	0,11	8,92	2.141,09
8	0,12	0,12	8,08	3.865,74
24	0,18	0,18	5,43	7.801,83
48	0,28	0,28	3,49	10.077,82
96	0,47	0,47	2,07	11.961,63

128 Token

Le tabelle seguenti mostrano i benchmark per uno scenario di 96 documenti, 128 token per documento.

Il modello cohere.embed-v4.0 ospitato su un'unità Embed Cohere di un cluster AI dedicato per tutte le aree, ad eccezione dell'area Saudi Arabia Central (Riyadh).


Concorrenza	Time to First Token (TTFT)(secondo)	Latenza a livello di richiesta (secondo)	Throughput a livello di richiesta (richiesta al secondo) (RPS)	Throughput totale (token/secondo)
1	0,09	0,09	11,27	1.381,7
2	0,09	0,09	10,67	2.617,09
4	0,1	0,1	9,67	4.750,2
8	0,12	0,12	8,14	7.990,79
24	0,22	0,22	4,29	12.624,79
48	0,35	0,35	2,76	16.251,43
96	0,64	0,64	1,51	17.735,38

Il modello cohere.embed-v4.0 è ospitato su un'unità Embed Cohere di un cluster AI dedicato per l'area Saudi Arabia Central (Riyadh).


Concorrenza	Time to First Token (TTFT)(secondo)	Latenza a livello di richiesta (secondo)	Throughput a livello di richiesta (richiesta al secondo) (RPS)	Throughput totale (token/secondo)
1	0,1	0,1	9,69	1.189,24
2	0,1	0,1	9,38	2.301,32
4	0,11	0,11	8,89	4.357,61
8	0,12	0,12	8	7.854,35
24	0,19	0,19	5,01	14.749,07
48	0,29	0,29	3,34	19.707,08
96	0,5	0,5	1,92	22.589,75

512 Token

Le tabelle seguenti mostrano i benchmark per uno scenario di 96 documenti, 512 token per documento.

Il modello cohere.embed-v4.0 ospitato su un'unità Embed Cohere di un cluster AI dedicato per tutte le aree, ad eccezione dell'area Saudi Arabia Central (Riyadh).


Concorrenza	Time to First Token (TTFT)(secondo)	Latenza a livello di richiesta (secondo)	Throughput a livello di richiesta (richiesta al secondo) (RPS)	Throughput totale (token/secondo)
1	0,09	0,09	10,83	5.410,49
2	0,1	0,1	9,65	9.642,11
4	0,12	0,12	7,52	15.025,97
8	0,16	0,16	5,9	23.556,71
24	0,35	0,35	2,71	32.451,55
48	0,68	0,68	1,39	33.273,59
96	1,25	1,25	0,75	36.072,1

Il modello cohere.embed-v4.0 è ospitato su un'unità Embed Cohere di un cluster AI dedicato per l'area Saudi Arabia Central (Riyadh).


Concorrenza	Time to First Token (TTFT)(secondo)	Latenza a livello di richiesta (secondo)	Throughput a livello di richiesta (richiesta al secondo) (RPS)	Throughput totale (token/secondo)
1	0,1	0,1	9,44	4.715,27
2	0,11	0,11	9,06	9.051,76
4	0,11	0,11	8,42	16.813,69
8	0,14	0,14	6,86	27.394,77
24	0,24	0,24	3,88	46.487,91
48	0,42	0,42	2,17	51.986,9
96	0,77	0,77	1,18	56.778,17

1,024 Token

Le tabelle seguenti mostrano i benchmark per uno scenario di 96 documenti, 1.024 token per documento.

Il modello cohere.embed-v4.0 ospitato su un'unità Embed Cohere di un cluster AI dedicato per tutte le aree, ad eccezione dell'area Saudi Arabia Central (Riyadh).


Concorrenza	Time to First Token (TTFT)(secondo)	Latenza a livello di richiesta (secondo)	Throughput a livello di richiesta (richiesta al secondo) (RPS)	Throughput totale (token/secondo)
1	0,09	0,09	9,55	9.559,38
2	0,12	0,12	1,3	2.601,06
4	0,15	0,15	6,06	24.284,74
8	0,23	0,23	4,05	32.432,49
24	0,6	0,6	1,56	37.501,74
48	1,09	1,09	0,85	40.893,6
96	2,11	2,11	0,31	29.835,31

Il modello cohere.embed-v4.0 è ospitato su un'unità Embed Cohere di un cluster AI dedicato per l'area Saudi Arabia Central (Riyadh).


Concorrenza	Time to First Token (TTFT)(secondo)	Latenza a livello di richiesta (secondo)	Throughput a livello di richiesta (richiesta al secondo) (RPS)	Throughput totale (token/secondo)
1	0,1	0,1	9,14	9.158,45
2	0,11	0,11	8,64	17.307,93
4	0,13	0,13	7,25	29.048
8	0,16	0,16	5,51	44.150,34
24	0,38	0,38	2,38	57.261,32
48	0,64	0,64	1,39	66.942,72
96	1,2	1,2	0,74	70.865,77

2,048 Token

Le tabelle seguenti mostrano i benchmark per uno scenario di 96 documenti, 2.048 token per documento.

Il modello cohere.embed-v4.0 ospitato su un'unità Embed Cohere di un cluster AI dedicato per tutte le aree, ad eccezione dell'area Saudi Arabia Central (Riyadh).


Concorrenza	Time to First Token (TTFT)(secondo)	Latenza a livello di richiesta (secondo)	Throughput a livello di richiesta (richiesta al secondo) (RPS)	Throughput totale (token/secondo)
1	0,11	0,11	7,58	15.203,74
2	0,14	0,14	6,09	24.431,99
4	0,22	0,22	4	32.065,33
8	0,37	0,37	2,48	39.802,12
24	1,02	1,02	0,9	43.230,02
48	2	2	0,46	44.251,96

Il modello cohere.embed-v4.0 è ospitato su un'unità Embed Cohere di un cluster AI dedicato per l'area Saudi Arabia Central (Riyadh).


Concorrenza	Time to First Token (TTFT)(secondo)	Latenza a livello di richiesta (secondo)	Throughput a livello di richiesta (richiesta al secondo) (RPS)	Throughput totale (token/secondo)
1	0,11	0,11	8,35	16.740,19
2	0,12	0,12	7,14	28.651,67
4	0,16	0,16	5,54	44.470,3
8	0,23	0,23	3,7	59.426,49
24	0,59	0,59	1,46	70.295,49
48	1,11	1,11	0,78	75.560,01
96	2,08	2,08	0,42	80.426,61

8,096 Token

Le tabelle seguenti mostrano i benchmark per uno scenario di 96 documenti, 8.096 token per documento.

Il modello cohere.embed-v4.0 ospitato su un'unità Embed Cohere di un cluster AI dedicato per tutte le aree, ad eccezione dell'area Saudi Arabia Central (Riyadh).


Concorrenza	Time to First Token (TTFT)(secondo)	Latenza a livello di richiesta (secondo)	Throughput a livello di richiesta (richiesta al secondo) (RPS)	Throughput totale (token/secondo)
1	0,25	0,25	3,31	26.290,24
2	0,42	0,42	2,05	32.530,08
4	0,82	0,82	1,09	34.646,38
8	1,59	1,59	0,57	36.389,86
24	4,47	4,47	0,2	39.049,48
48	8,75	8,75	0,11	40.180,09
96	17,3	17,3	0,05	39.843,97

Il modello cohere.embed-v4.0 è ospitato su un'unità Embed Cohere di un cluster AI dedicato per l'area Saudi Arabia Central (Riyadh).


Concorrenza	Time to First Token (TTFT)(secondo)	Latenza a livello di richiesta (secondo)	Throughput a livello di richiesta (richiesta al secondo) (RPS)	Throughput totale (token/secondo)
1	0,17	0,17	4,57	36.262,71
2	0,26	0,26	3,14	49.882,53
4	0,5	0,5	1,69	53.606,93
8	0,9	0,9	0,96	60.838,78
24	2,38	2,38	0,36	69.450,5
48	4,52	4,52	0,19	73.294,47
96	8,72	8,72	0,1	76.456,16

32,000 Token

Le tabelle seguenti mostrano i benchmark per uno scenario di 96 documenti, 32.000 token per documento.

Il modello cohere.embed-v4.0 ospitato su un'unità Embed Cohere di un cluster AI dedicato per tutte le aree, ad eccezione dell'area Saudi Arabia Central (Riyadh).


Concorrenza	Time to First Token (TTFT)(secondo)	Latenza a livello di richiesta (secondo)	Throughput a livello di richiesta (richiesta al secondo) (RPS)	Throughput totale (token/secondo)
1	0,92	0,92	0,89	27.968,24
2	1,74	1,74	0,5	31.141,92
4	2,92	2,92	0,3	37.838,06
8	5,73	5,73	0,16	39.090,65
24	16,86	16,86	0,05	40.623,28

Il modello cohere.embed-v4.0 è ospitato su un'unità Embed Cohere di un cluster AI dedicato per l'area Saudi Arabia Central (Riyadh).


Concorrenza	Time to First Token (TTFT)(secondo)	Latenza a livello di richiesta (secondo)	Throughput a livello di richiesta (richiesta al secondo) (RPS)	Throughput totale (token/secondo)
1	0,53	0,53	1,41	44.178,97
2	0,88	0,88	0,9	56.692,99
4	1,58	1,58	0,52	65.690,47
8	2,99	2,99	0,28	70.962,43
24	8,47	8,47	0,1	75.910,53
48	16,6	16,6	0,05	77.493,42

Incorporazioni immagine

Questo scenario si applica solo ai modelli di incorporamento con input di immagine. In ogni scenario, I(M,N): immagine con Npx di altezza e Mpx di larghezza rappresenta un'immagine con l'altezza di M e la larghezza di N pixel. Ad esempio, I(1024,512) è un'immagine con l'altezza di 1,024 pixel e la larghezza di 512 pixel.

S(512,512)

Le tabelle seguenti mostrano i benchmark per uno scenario di un'immagine con altezza e larghezza di 512 pixel.

Il modello cohere.embed-v4.0 ospitato su un'unità Embed Cohere di un cluster AI dedicato per tutte le aree, ad eccezione dell'area Saudi Arabia Central (Riyadh).


Concorrenza	Latenza a livello di richiesta (secondo)	Throughput a livello di richiesta (richiesta al secondo) (RPS)
1	0,18	4,76
2	0,19	8,89
4	0,27	13,17
8	0,49	14,84
16	0,94	16,14
32	1,84	16,45
64	3,66	16,38
128	7,27	16,06
256	13,57	16

Il modello cohere.embed-v4.0 è ospitato su un'unità Embed Cohere di un cluster AI dedicato per l'area Saudi Arabia Central (Riyadh).


Concorrenza	Latenza a livello di richiesta (secondo)	Throughput a livello di richiesta (richiesta al secondo) (RPS)
1	0,15	4,98
2	0,16	10,3
4	0,17	19,51
8	0,21	32,83
16	0,33	43,06
32	0,65	44,02
64	1,32	43,77
128	2,71	41,9
256	5,29	40,35

S(1024,512)

Le tabelle seguenti mostrano i benchmark per uno scenario di un'immagine con un'altezza di 1.024 pixel e una larghezza di 512 pixel.

Il modello cohere.embed-v4.0 ospitato su un'unità Embed Cohere di un cluster AI dedicato per tutte le aree, ad eccezione dell'area Saudi Arabia Central (Riyadh).


Concorrenza	Latenza a livello di richiesta (secondo)	Throughput a livello di richiesta (richiesta al secondo) (RPS)
1	0,25	3,42
2	0,25	6,72
4	0,38	9,17
8	0,78	9,52
16	1,52	10,04
32	2,93	10,5
64	5,75	10,48
128	11,23	10,52
256	19,97	10,13

Il modello cohere.embed-v4.0 è ospitato su un'unità Embed Cohere di un cluster AI dedicato per l'area Saudi Arabia Central (Riyadh).


Concorrenza	Latenza a livello di richiesta (secondo)	Throughput a livello di richiesta (richiesta al secondo) (RPS)
1	0,19	3,91
2	0,19	8,29
4	0,22	15,05
8	0,36	19,68
16	0,67	22,08
32	1,35	22,21
64	2,71	22
128	5,44	21,09
256	10,2	21,29

S(2048,2048)

Le tabelle seguenti mostrano i benchmark per uno scenario di un'immagine con altezza e larghezza di 2.048 pixel.

Il modello cohere.embed-v4.0 ospitato su un'unità Embed Cohere di un cluster AI dedicato per tutte le aree, ad eccezione dell'area Saudi Arabia Central (Riyadh).


Concorrenza	Latenza a livello di richiesta (secondo)	Throughput a livello di richiesta (richiesta al secondo) (RPS)
1	0,86	1,04
2	0,98	1,73
4	1,84	2,04
8	3,02	1,42
16	7,71	2,03
32	14,93	2,1
64	25,73	1,98
128	26,92	1,86
256	27,29	1,91

Il modello cohere.embed-v4.0 è ospitato su un'unità Embed Cohere di un cluster AI dedicato per l'area Saudi Arabia Central (Riyadh).


Concorrenza	Latenza a livello di richiesta (secondo)	Throughput a livello di richiesta (richiesta al secondo) (RPS)
1	0,66	1,25
2	0,69	2,49
4	1,07	3,4
8	2,24	3,41
16	4,57	3,4
32	9,22	3,37
64	18,53	3,3
128	24,61	2,77
256	25,78	2,71

Documentazione dell'infrastruttura Oracle Cloud

Cohere Embed 4

Incorporamenti testo

64 Token

128 Token

512 Token

1,024 Token

2,048 Token

8,096 Token

32,000 Token

Incorporazioni immagine

S(512,512)

S(1024,512)

S(2048,2048)