Przedstawiamy Eleven v3 (alpha)

3 cze 2025 • 6 minut czytania

Mati Staniszewski, Co-founder,

Piotr Dabkowski, Co-Founder, Research

Najbardziej ekspresyjny model Text to Speech

Skontaktuj się z działem sprzedaży Eleven v3 Prompting v3

Z przyjemnością przedstawiamy Eleven v3 (alpha) — najbardziej ekspresyjny model Text to Speech.

Ta wersja badawcza daje niespotykaną kontrolę i realizm w generowaniu mowy dzięki:

Ponad 70 języków
Dialogi wieloosobowe
Audio tags like [excited], [whispers], and [sighs]

Eleven v3 (alpha) wymaga więcej inżynierii promptów niż wcześniejsze modele — ale generacje są zachwycające.

Jeśli pracujesz nad wideo, audiobookami lub narzędziami medialnymi — to otwiera nowy poziom ekspresji. Do zastosowań w czasie rzeczywistym i konwersacyjnych polecamy na razie pozostać przy v2.5 Turbo lub Flash. Wersja v3 w czasie rzeczywistym jest w trakcie opracowywania.

Eleven v3 jest dostępny już dziś na naszej stronie. Publiczny dostęp do API wkrótce. Aby uzyskać wcześniejszy dostęp, skontaktuj się z działem sprzedaży.

Korzystanie z nowego modelu w aplikacji ElevenLabs jest tańsze o 80% do końca czerwca. Zarejestruj się tutaj.

Why we built v3

Dlaczego stworzyliśmy v3expressiveness. More exaggerated emotions, conversational interruptions, and believable back-and-forth were difficult to achieve.

Od czasu wprowadzenia Multilingual v2, widzieliśmy, jak głos AI jest wykorzystywany w profesjonalnym filmie, tworzeniu gier, edukacji i dostępności. Ale stałym ograniczeniem nie była jakość dźwięku — była to

Eleven v3 rozwiązuje ten problem. Został zbudowany od podstaw, aby dostarczać głosy, które wzdychają, szepczą, śmieją się i reagują — tworząc mowę, która wydaje się naprawdę responsywna i żywa.

Feature	What it unlocks
Audio tags	Inline control of tone, emotion, and non-verbal reactions
Dialogue mode	Multi-speaker conversations with natural pacing and interruptions
70+ languages	Full coverage of high-demand global languages
Deeper text understanding	Better stress, cadence, and expressivity from text input

Hear v3 for yourself

Using audio tags

Korzystanie z tagów audioprompting guide for v3 in the docs.

Tagi audio są wstawiane bezpośrednio w skrypcie i formatowane za pomocą małych nawiasów kwadratowych. Więcej o tagach audio znajdziesz w naszym

1“[happily][shouts] We did it! [laughs].”

Na przykład, możesz użyć: „[szeptem] Coś nadchodzi… [westchnienie] Czuję to.” Dla większej kontroli ekspresji, możesz połączyć kilka tagów:

Tworzenie dialogów wieloosobowychText to Dialogue API endpoint. Provide a structured array of JSON objects — each representing a speaker turn — and the model generates a cohesive, overlapping audio file:

1[
2  {"speaker_id": "scarlett", "text": "(cheerfully) Perfect! And if that pop-up is bothering you, there’s a setting to turn it off under Notifications → Preferences."},
3  {"speaker_id": "lex", "text": "You are a hero. An actual digital wizard. I was two seconds from sending a very passive-aggressive support email."},
4  {"speaker_id": "scarlett", "text": "(laughs) Glad we could stop that in time. Anything else I can help with today?"}
5]
6

Eleven v3 jest obsługiwany w naszym istniejącym endpointzie Text to Speech. Dodatkowo wprowadzamy nowy

Endpoint automatycznie zarządza przejściami między mówcami, zmianami emocji i przerwami.here.

v3 is our most expressive model

Dowiedz się więcej

Plan	Launch promo	After 30 days
UI (self-serve)	80% off (~5× cheaper)	Same as Multilingual V2
API (self-serve & enterprise)	Same as Multilingual V2	Same
Enterprise UI	Same as Multilingual V2	Same

Ceny i dostępność

Use the Model Picker and select Eleven v3 (alpha)

Aby włączyć v3:contact sales.

Dostęp do API i wsparcie w Studio wkrótce. Aby uzyskać wcześniejszy dostęp,

Dostęp do API i wsparcie w Studio wkrótce. Aby uzyskać wczesny dostęp,

Kiedy nie używać v3v3 documentation and FAQ.

Try it today

Log in to ElevenLabs UI
dokumentacji v3 3 (alpha) in the model dropdown
Paste your script — use tags or dialogue
Generate audio

We’re excited to see how you bring v3 to life across new use cases — from immersive storytelling to cinematic production pipelines.

Eleven v3 is 80% off until the end of June 2025 for self-serve users using it through the UI.

They were generated with only the Eleven v3 model.

Text to Dialogue weaves multiple voices together to create a seamless interaction between them. Matching prosody, emotional range and taking cues from audio tags, Text to Dialogue is a leap forward in generating engaging conversations.

Public API for Eleven v3 (alpha) is coming soon. For early access, please contact sales.

Eleven v3 supports a wide variety of audio tags and are somewhat voice and context dependent. Read the prompting guide for further information.

Afrikaans (afr), Arabic (ara), Armenian (hye), Assamese (asm), Azerbaijani (aze), Belarusian (bel), Bengali (ben), Bosnian (bos), Bulgarian (bul), Catalan (cat), Cebuano (ceb), Chichewa (nya), Croatian (hrv), Czech (ces), Danish (dan), Dutch (nld), English (eng), Estonian (est), Filipino (fil), Finnish (fin), French (fra), Galician (glg), Georgian (kat), German (deu), Greek (ell), Gujarati (guj), Hausa (hau), Hebrew (heb), Hindi (hin), Hungarian (hun), Icelandic (isl), Indonesian (ind), Irish (gle), Italian (ita), Japanese (jpn), Javanese (jav), Kannada (kan), Kazakh (kaz), Kirghiz (kir), Korean (kor), Latvian (lav), Lingala (lin), Lithuanian (lit), Luxembourgish (ltz), Macedonian (mkd), Malay (msa), Malayalam (mal), Mandarin Chinese (cmn), Marathi (mar), Nepali (nep), Norwegian (nor), Pashto (pus), Persian (fas), Polish (pol), Portuguese (por), Punjabi (pan), Romanian (ron), Russian (rus), Serbian (srp), Sindhi (snd), Slovak (slk), Slovenian (slv), Somali (som), Spanish (spa), Swahili (swa), Swedish (swe), Tamil (tam), Telugu (tel), Thai (tha), Turkish (tur), Ukrainian (ukr), Urdu (urd), Vietnamese (vie), Welsh (cym)

Przeglądaj artykuły zespołu ElevenLabs

Product

Product

How we engineered RAG to be 50% faster

Tips from latency-sensitive RAG systems in production

Customer stories

Customer stories

Eagr.ai Supercharges Sales Training with ElevenLabs' Conversational AI Agents

Eagr.ai transformed sales coaching by integrating ElevenLabs' conversational AI, replacing outdated role-playing with lifelike simulations. This led to a significant 18% average increase in win-rates and a 30% performance boost for top users, proving the power of realistic AI in corporate training.

Twórz z najwyższą jakością dźwięku AI

Zacznij za darmo

Masz już konto? Zaloguj się

1	[
2	{"speaker_id": "scarlett", "text": "(cheerfully) Perfect! And if that pop-up is bothering you, there’s a setting to turn it off under Notifications → Preferences."},
3	{"speaker_id": "lex", "text": "You are a hero. An actual digital wizard. I was two seconds from sending a very passive-aggressive support email."},
4	{"speaker_id": "scarlett", "text": "(laughs) Glad we could stop that in time. Anything else I can help with today?"}
5	]
6

Przedstawiamy Eleven v3 (alpha)

Why we built v3

Eleven v3 rozwiązuje ten problem. Został zbudowany od podstaw, aby dostarczać głosy, które wzdychają, szepczą, śmieją się i reagują — tworząc mowę, która wydaje się naprawdę responsywna i żywa.

Hear v3 for yourself

Using audio tags

Na przykład, możesz użyć: „[szeptem] Coś nadchodzi… [westchnienie] Czuję to.” Dla większej kontroli ekspresji, możesz połączyć kilka tagów:

v3 is our most expressive model

Dowiedz się więcej

Dostęp do API i wsparcie w Studio wkrótce. Aby uzyskać wcześniejszy dostęp,

Try it today

How does the Eleven v3 80% discount work?

How were the samples in the video and website generated?

How does dialogue generation work?

Is this available over API?

What audio tags are supported?

What languages does it support?

Przeglądaj artykuły zespołu ElevenLabs

How we engineered RAG to be 50% faster

Eagr.ai Supercharges Sales Training with ElevenLabs' Conversational AI Agents