Introducing Eleven v3 (alpha)

Jun 3, 2025 • 7 minutes reading time

Mati Staniszewski, Co-founder,

Piotr Dabkowski, Co-Founder, Research

The most expressive Text to Speech model

Contact Sales Eleven v3 Prompting v3

We're pleased to reveal Eleven v3 (alpha) — the most expressive Text to Speech model.

This research preview brings unprecedented control and realism to speech generation with:

70+ languages
Multi-speaker dialogue
Audio tags like [excited], [whispers], and [sighs]

Eleven v3 (alpha) requires more prompt engineering than previous models — but the generations are breathtaking.

If you’re working on videos, audiobooks, or media tools — this unlocks a new level of expressiveness. For real-time and conversational use cases, we recommend staying with v2.5 Turbo or Flash for now. A real-time version of v3 is in development.

Eleven v3 is available today on our website and in the API.

Why we built v3

Since launching Multilingual v2, we’ve seen voice AI adopted in professional film, game development, education, and accessibility. But the consistent limitation wasn’t sound quality — it was expressiveness. More exaggerated emotions, conversational interruptions, and believable back-and-forth were difficult to achieve.

Eleven v3 addresses this gap. It was built from the ground up to deliver voices that sigh, whisper, laugh, and react — producing speech that feels genuinely responsive and alive.

What’s new in Eleven v3 (alpha)

Feature	What it unlocks
Audio tags	Inline control of tone, emotion, and non-verbal reactions
Dialogue mode	Multi-speaker conversations with natural pacing and interruptions
70+ languages	Full coverage of high-demand global languages
Deeper text understanding	Better stress, cadence, and expressivity from text input

Hear v3 for yourself

Using audio tags

Audio tags live inline with your script and are formatted with lowercase square brackets. You can see more about audio tags in our prompting guide for v3 in the docs.

Professional Voice Clones (PVCs) are currently not fully optimized for Eleven v3, resulting in potentially lower clone quality compared to earlier models. During this research preview stage it would be best to find an Instant Voice Clone (IVC) or designed voice for your project if you need to use v3 features. PVC optimization for v3 is coming in the near future.

For example, you could prompt: “[whispers] Something’s coming… [sighs] I can feel it.” Or for more expressive control, you can combine multiple tags:

1“[happily][shouts] We did it! [laughs].”

Crafting multi-speaker dialogue

Eleven v3 is supported in our existing Text to Speech endpoint. Additionally, we introduce a new Text to Dialogue API endpoint. Provide a structured array of JSON objects — each representing a speaker turn — and the model generates a cohesive, overlapping audio file:

1[
2  {"speaker_id": "scarlett", "text": "(cheerfully) Perfect! And if that pop-up is bothering you, there’s a setting to turn it off under Notifications → Preferences."},
3  {"speaker_id": "lex", "text": "You are a hero. An actual digital wizard. I was two seconds from sending a very passive-aggressive support email."},
4  {"speaker_id": "scarlett", "text": "(laughs) Glad we could stop that in time. Anything else I can help with today?"}
5]
6

The endpoint automatically manages speaker transitions, emotional changes, and interruptions.

Learn more here.

v3 is our most expressive model

Pricing and availability

Plan	Launch promo	At the end of June
UI (self-serve)	80% off (~5× cheaper)	Same as Multilingual V2
UI (enterprise)	80% off business plan pricing	Business plan pricing

To enable v3:

Use the Model Picker and select Eleven v3 (alpha)

API access and support in Studio are coming soon. For early access, please contact sales.

When not to use v3

Eleven v3 (alpha) requires more prompt engineering than our previous models. When it works the output is breathtaking but the reliability and higher latency means it’s not suitable for real-time and conversational use cases. For these, we recommend Eleven v2.5 Turbo/Flash.

For more, refer to the full v3 documentation and FAQ.

Try it today

Log in to ElevenLabs UI
Select v3 (alpha) in the model dropdown
Paste your script — use tags or dialogue
Generate audio

We’re excited to see how you bring v3 to life across new use cases — from immersive storytelling to cinematic production pipelines.

Eleven v3 is 80% off until the end of June 2025 for self-serve users using it through the UI.

They were generated with only the Eleven v3 model.

Text to Dialogue weaves multiple voices together to create a seamless interaction between them. Matching prosody, emotional range and taking cues from audio tags, Text to Dialogue is a leap forward in generating engaging conversations.

Public API for Eleven v3 (alpha) is coming soon. For early access, please contact sales.

Eleven v3 supports a wide variety of audio tags and are somewhat voice and context dependent. Read the prompting guide for further information.

Afrikaans (afr), Arabic (ara), Armenian (hye), Assamese (asm), Azerbaijani (aze), Belarusian (bel), Bengali (ben), Bosnian (bos), Bulgarian (bul), Catalan (cat), Cebuano (ceb), Chichewa (nya), Croatian (hrv), Czech (ces), Danish (dan), Dutch (nld), English (eng), Estonian (est), Filipino (fil), Finnish (fin), French (fra), Galician (glg), Georgian (kat), German (deu), Greek (ell), Gujarati (guj), Hausa (hau), Hebrew (heb), Hindi (hin), Hungarian (hun), Icelandic (isl), Indonesian (ind), Irish (gle), Italian (ita), Japanese (jpn), Javanese (jav), Kannada (kan), Kazakh (kaz), Kirghiz (kir), Korean (kor), Latvian (lav), Lingala (lin), Lithuanian (lit), Luxembourgish (ltz), Macedonian (mkd), Malay (msa), Malayalam (mal), Mandarin Chinese (cmn), Marathi (mar), Nepali (nep), Norwegian (nor), Pashto (pus), Persian (fas), Polish (pol), Portuguese (por), Punjabi (pan), Romanian (ron), Russian (rus), Serbian (srp), Sindhi (snd), Slovak (slk), Slovenian (slv), Somali (som), Spanish (spa), Swahili (swa), Swedish (swe), Tamil (tam), Telugu (tel), Thai (tha), Turkish (tur), Ukrainian (ukr), Urdu (urd), Vietnamese (vie), Welsh (cym)

Explore articles by the ElevenLabs team

Customer stories

Customer stories

Eagr.ai Supercharges Sales Training with ElevenLabs' Conversational AI Agents

Eagr.ai transformed sales coaching by integrating ElevenLabs' conversational AI, replacing outdated role-playing with lifelike simulations. This led to a significant 18% average increase in win-rates and a 30% performance boost for top users, proving the power of realistic AI in corporate training.

Customer stories

Customer stories

Burda - Strategic Partnership for Audio AI and Voice Agent Solutions

BurdaVerlag is partnering with ElevenLabs to integrate its advanced AI audio and voice agent technology into the AISSIST platform. This will provide powerful tools for text-to-speech, transcription, and more, streamlining workflows for media and publishing professionals.

Create with the highest quality AI Audio

Get started free

Already have an account? Log in

1	[
2	{"speaker_id": "scarlett", "text": "(cheerfully) Perfect! And if that pop-up is bothering you, there’s a setting to turn it off under Notifications → Preferences."},
3	{"speaker_id": "lex", "text": "You are a hero. An actual digital wizard. I was two seconds from sending a very passive-aggressive support email."},
4	{"speaker_id": "scarlett", "text": "(laughs) Glad we could stop that in time. Anything else I can help with today?"}
5	]
6

Introducing Eleven v3 (alpha)

Why we built v3

What’s new in Eleven v3 (alpha)

Hear v3 for yourself

Using audio tags

Crafting multi-speaker dialogue

v3 is our most expressive model

Pricing and availability

When not to use v3

Try it today

How does the Eleven v3 80% discount work?

How were the samples in the video and website generated?

How does dialogue generation work?

Is this available over API?

What audio tags are supported?

What languages does it support?

Explore articles by the ElevenLabs team

Eagr.ai Supercharges Sales Training with ElevenLabs' Conversational AI Agents

Burda - Strategic Partnership for Audio AI and Voice Agent Solutions