All ElevenLabs models below are accessible through the Hicap API. ElevenLabs provides industry-leading text-to-speech (TTS) and speech-to-text (STT) capabilities. For pricing, see the Model Catalog.Documentation Index
Fetch the complete documentation index at: https://docs.hicap.ai/llms.txt
Use this file to discover all available pages before exploring further.
Text-to-Speech Models
eleven_multilingual_v2
Eleven Multilingual v2 is ElevenLabs’ flagship multilingual text-to-speech model, supporting 29 languages with natural-sounding, expressive voice synthesis. It delivers high-quality speech with nuanced prosody and emotional range.Best for:
Multilingual voice applications, audiobook narration, content localization, accessibility features, and customer-facing voice experiences where natural speech quality is essential.
| Input | Output | |
|---|---|---|
Text | Audio |
This model offers
- * 10,000 character context window
- * 29 language support
eleven_v3
Eleven v3 is the latest generation ElevenLabs TTS model, delivering improved voice quality, faster generation, and enhanced expressiveness in a compact context window.Best for:
Real-time voice assistants, interactive applications, short-form content, and latency-sensitive voice experiences where fast generation matters.
| Input | Output | |
|---|---|---|
Text | Audio |
This model offers
- * 5,000 character context window
Speech-to-Text Models
scribe_v1
Scribe v1 is ElevenLabs’ speech-to-text transcription model, supporting 90+ languages with accurate transcription of spoken audio into text.Best for:
Audio transcription, meeting notes, podcast indexing, subtitle generation, and voice-to-text workflows across multiple languages.
| Input | Output | |
|---|---|---|
Audio | Text |
This model offers
- * 90+ language support
scribe_v2
Scribe v2 is the latest generation of ElevenLabs’ transcription model, offering improved accuracy and language coverage over Scribe v1.Best for:
High-accuracy transcription, professional media workflows, real-time captioning, and enterprise audio processing where transcription quality is critical.
| Input | Output | |
|---|---|---|
Audio | Text |
This model offers
- * 90+ language support