Smarter Understanding, Clearer Calls: Retell’s ASR Just Got a Major Upgrade


At the core of understanding people correctly and having natural conversations lies the role of automatic speech recognition (ASR), which is critical for any AI call bot.
Telephonic conversations are inherently challenging from a speech recognition perspective. Poor connectivity, background noise, and various dialects and accents make understanding a caller’s words difficult.
That’s why we’ve rolled out a major upgrade to Retell’s Automatic Speech Recognition (ASR) engine, bringing sharper transcription, stronger intent detection, and more reliable call outcomes across seven widely used languages.
This means better accuracy, crystal-clear transcripts, and higher call completion rates.
Retell’s new ASR (text-to-speech) now supports 22+ new languages including:
This brings our total language count to 50+, bringing us one step closer to making content accessible in any language. The addition of these languages opens up vast possibilities for businesses to reach new audiences; Arabic alone is spoken by 450 million, Persian 130 million and Urdu 250 million.
These languages are made available as part of our new ASR. You can try out OpenAI TTS to build your next voice agent with Retell. You can also clone your voice and convert to any of our 50+ languages.
This extensive language support enables businesses to effectively engage with a global audience. Try Retell AI today and see it in action.
Enhancing your AI bot's ability to communicate in multiple languages is a powerful way to improve user experience.
With Retell AI, you can enable this multilingual capabilities in a few simple steps:

Navigate to the Agent Dashboard and select the bot you want to configure. Click on the Global Settings menu on the right-hand side.

In the Voice and Language section under Global Settings, click the dropdown menu to explore available languages.
Choose the desired language for your bot. For example, selecting Spanish (Latin America) will apply this voice and language setting to the bot.

After selecting the language, return to the Conversation Flow editor and ensure all messages are accurately translated for the target audience.
For example, in the Greetings node, the bot might say:
“Hola, soy Anna, una representante de inteligencia artificial que llama desde la organización Retell Healthcare en una línea grabada…” (when Spanish is selected).
Confirm that every conversation node—including user prompts and responses—consistently matches the selected language.
These multilingual voice flows can also be set up within an AI-powered IVR system, enabling callers to navigate menus and reach the appropriate department in their preferred language.

Use the Test option in Global Settings to simulate a conversation and verify that the bot responds smoothly in the selected language. Review both voice and text outputs to ensure accuracy and consistency.
This configuration can also support advanced use cases, such as an AI appointment setter, where the bot confirms dates, times, and other details while naturally speaking the customer’s preferred language.
Tips for an Effective Multilingual Setup
By following these best practices, your AI bot can communicate clearly with a broader audience, improving accessibility and delivering a more inclusive customer experience.
Real-time transcription is often a tradeoff between latency and accuracy.
When you optimize for speed, you get the lowest latency but a higher chance of errors due to less context. When relying on results with more context, you risk waiting longer after the user stops speaking.
Retell offers two transcription models:

Even though we've found that the optimize for speed mode and optimize for accuracy mode have similar WER (Word Error Rate). The real difference lies in the slightest details like number, date, or address.
By optimizing our acoustic modeling pipeline, refining language-specific phonetic dictionaries, and improving real-time decoding, Retell now delivers dramatically lower Word Error Rates (WER) in both Accurate mode and Fast & Accurate modes.
For German, French, Italian, and Polish, we cut Word Error Rate by 7–10 points.
These were already strong languages in our Accurate mode. Still, the new modeling architecture significantly reduces the standard error types we observed in real customer calls, like accent-driven phoneme swaps, background-noise distortions, and gender/number agreement mistakes.
| Language | Word-Average WER | Call-Average WER | What This Improvement Means |
|---|---|---|---|
| German | 0.1944 | 0.1971 | Misheard consonants and accent variance errors drop noticeably. |
| French | 0.2665 | 0.2552 | Reduces noise sensitivity and improves handling of liaison and nasal vowels. |
| Italian | 0.1781 | 0.2457 | Smoother, natural-sounding call transcripts. |
| Polish | 0.1733 | 0.1688 | Better recognition of consonant clusters and inflections. |
For Chinese (Mandarin), Malay, and Hindi, the gains are even bigger: WER improvements of 15–25 points.
These languages have historically been challenging for ASR due to tonal dynamics (Mandarin), code-mixing (Malay), and accent diversity (Hindi). The upgraded engine now handles these complexities far more intelligently.
| Language | Word-Average WER | Call-Average WER | What This Improvement Means |
|---|---|---|---|
| Malay | 0.2623 | 0.2988 | Fewer tone-confusion errors and better handling of rapid speech. |
| Hindi | 0.3010 | 0.3150 | Big gains in code-mixed speech (Malay + English), with better real-time clarity. |
| Mandarin | 0.2605 | 0.2636 | Drastically improving call transcription stability across accents. |
The new ASR engine reduces the mismatch between what callers say and what the AI thinks they said. With lower WER, our LLM-powered reasoning engine receives clearer text, enabling:
This upgrade doesn’t just improve transcription — it elevates the entire voice automation experience.
See how much your business could save by switching to AI-powered voice agents.
Total Human Agent Cost
AI Agent Cost
Estimated Savings
A Demo Phone Number From Retell Clinic Office

Start building smarter conversations today.




