Voxtral Mini TTS is Mistral's text-to-speech model featuring zero-shot voice cloning and multilingual support. It converts text input into natural-sounding audio output.
Modalities
Price
$16/M characters
Context
4K
Released
Apr 19, 2026
Create an API key from your OpenRouter dashboard and set it as an environment variable:
Use mistralai/voxtral-mini-tts-2603 with the OpenRouter API:
OpenRouter provides a text-to-speech API that converts text into natural-sounding audio. Send text and a voice selection, and receive raw audio bytes in your chosen format.
The response is a raw audio stream (not JSON). The generation ID is returned in the X-Generation-Id response header for tracking.
For information about using third-party SDKs and frameworks with OpenRouter, please see our frameworks documentation.
Synthesizes audio from the input text. Returns a raw audio bytestream in the requested format (e.g. mp3, pcm, wav).
https://openrouter.ai/api/v1/audio/speechBearer $OPENROUTER_API_KEYapplication/jsonoptional — your site URL, for rankingsoptional — your site name, for rankingsmistralai/voxtral-mini-tts-2603| Name | Type | Default | Description |
|---|---|---|---|
max_tokens | integer | — | This sets the upper limit for the number of tokens the model can generate in response. |
temperature | float | 1 | This setting influences the variety in the model's responses. |
top_p | float | 1 | This setting limits the model's choices to a percentage of likely tokens: only the top tokens whose probabilities add up to P. |
stop | array | — | Stop generation immediately if the model encounter any token specified in the stop array. |
frequency_penalty | float | 0 | This setting aims to control the repetition of tokens based on how often they appear in the input. |
presence_penalty | float | 0 | Adjusts how often the model repeats specific tokens already used in the input. |
seed | integer | — | If specified, the inferencing will sample deterministically, such that repeated requests with the same seed and parameters should return the same result. |
response_format | map | — | Forces the model to produce specific output format. |
Weekly Tokens
120K