Convert Text to Speech with xAI Voice in OCI Generative AI

You can now convert text into spoken audio in OCI Generative AI by using Text to Speech with xAI Voice.

Text to Speech is available in two ways:

  • OCI OpenAI-compatible Audio Speech API: Call the xai.grok-tts model through the OCI OpenAI-compatible endpoint. This option uses the OpenAI-compatible speech request format with the xai.grok-tts model and supported xAI voices.
  • WebSocket streaming: Use the OCI xAI WebSocket endpoint for streaming text-to-speech. This option uses xAI text-to-speech settings and streams text input and audio output over a WebSocket connection.

Key Features

  • Generate spoken audio from text.
  • Select from five supported voices: Ara, Eve, Leo, Rex, and Sal.
  • Configure audio output such as language, audio format, sample rate, and bit rate.
  • Use the OCI OpenAI-compatible Audio Speech API for request-based speech generation.
  • Use WebSocket streaming for interactive or low-latency applications where text can be sent incrementally and audio can be returned in chunks.

For model details, see xAI Voice (Text to Speech). For available regions, see Generative AI Models by Region. For information about the service, see the Generative AI documentation.