explainer

Multilingual Meeting Transcription: What Works and What Doesn't

An honest guide to AI transcription for multilingual teams—covering code-switching, tool comparisons, the cross-language summary trick, and practical tips for better results.

MinuteKeep TeamApril 24, 2026

#multilingual transcription#multilingual meetings#AI transcription#code-switching#non-English meetings#remote work#global teams

About 40% of the global workforce regularly attends meetings in a language that is not their primary one. That number comes from research across multinational employers and is almost certainly an undercount — it reflects people who identify as non-native speakers, not the many native speakers who participate in meetings where others are working in their second or third language.

This is the reality of remote work in global teams: the meeting is happening in English, but two participants are native German speakers, one is Japanese, one is Brazilian, and the person who asked the best question last week types her follow-up emails in Spanish. The recording gets made, the AI transcription runs, and then someone has to figure out what actually came out on the other side.

Multilingual transcription tools have improved substantially. But the gap between marketing claims and practical reality remains real. This is an honest account of where things stand — what current technology handles well, where it falls short, and how to get better results from whatever tool you choose.

Automate your meeting notes. MinuteKeep records your meeting and uses AI to transcribe, summarize, and extract action items. 9 languages, no subscription, 30 min free.

The Specific Challenges of Multilingual Transcription

Most AI transcription guides treat "multilingual support" as a checkbox: does the tool list your language on its feature page? That framing misses several problems that matter in practice.

Code-switching

Code-switching is the term researchers use for switching between languages mid-conversation — or even mid-sentence. It is not sloppy communication. It is how multilingual people actually talk. A Japanese engineering team might discuss architecture in Japanese but drop in English terms for "pull request," "sprint backlog," or a specific AWS service name. A German-English meeting might use German for strategy and English for customer-facing terminology that the team has standardized on.

The problem for AI transcription is that most models are trained on single-language audio. When a speaker switches languages, monolingual models face what researchers call a language identification failure: the model keeps applying the rules of the original language to input it no longer matches. Word error rate at language-switching boundaries can spike by 30 to 50 percentage points compared to monolingual speech. At the high end, the transcript produced at those moments is essentially unusable.

Whisper — the model that powers many transcription tools, including MinuteKeep — is a multilingual model with native support for 99 languages, trained on 680,000 hours of multilingual audio. This gives it better baseline behavior than single-language models in code-switching situations, but it is not immune to the problem. OpenAI's own documentation notes that performance in code-switching scenarios remains an active area of research.

Accents, dialects, and uneven training data

A tool that claims to support "Spanish" may have been trained primarily on Castilian Spanish from Spain. A Mexican, Colombian, or Argentine speaker may get materially worse results — even though their language is listed as supported. The same gap applies to German (standard vs. regional dialects), Portuguese (Brazil vs. Portugal), Arabic (Modern Standard vs. colloquial dialects), and Chinese (Mandarin vs. Cantonese).

Whisper's training data distribution follows the availability of audio on the internet, which means English is heavily overrepresented. Whisper achieves around 5–6% word error rate on English audio; performance varies significantly for lower-resource languages. OpenAI lists only languages that achieve better than 50% word error rate as officially supported — which tells you something about the range in the other direction.

Proper nouns across scripts

In monolingual English meetings, AI transcription already struggles with proper nouns — company names, product names, people's names that fall outside the training distribution. In multilingual meetings, this problem compounds. A Japanese person's name rendered phonetically in English audio may produce something unrecognizable. A Korean brand name spoken by a French speaker may come out differently still. Custom dictionaries address this, but only if you know in advance which terms to add. For a deeper look at how the custom dictionary approach works, see the guide to fixing names and terms with a custom dictionary.

Character-based languages

For Japanese, Chinese, Korean, and Arabic, the standard accuracy metric — word error rate — does not translate cleanly. These languages segment differently. A single character in Chinese can carry the semantic weight of an English phrase. Character error rate is the more meaningful measurement, and most published benchmarks only report word error rate for alphabetic languages. When a tool claims "98% accuracy" for Japanese or Chinese, that figure rarely has the methodological backing it would need to be meaningful for professional use.

The State of the Technology in 2026

The honest picture: multilingual transcription has improved significantly, but it remains uneven across languages and actively difficult for code-switching scenarios.

What works well:

Single-language meetings in high-resource languages (English, Japanese, German, French, Spanish, Korean, Portuguese, Chinese, Arabic) with a clear primary speaker
Clean audio with minimal background noise
Standard accents within the languages that have strong training representation
Transcription followed by AI summarization, which can absorb and smooth over minor transcription errors

What doesn't work well yet:

Mid-meeting language switching without pre-selection of a language
Heavy accents or regional dialects in languages with limited training data
Proper nouns that don't appear in training data
Multiple speakers with different accents speaking simultaneously
Low-resource languages, even when listed as "supported"

The GPT-4o-based transcription models released by OpenAI in early 2025 show lower error rates than earlier Whisper versions across a range of languages. The technology is improving. But the code-switching challenge remains an open research problem.

How Major Tools Handle Multilingual Meetings

Understanding where tools differ helps you make the right choice for your team's specific situation.

Notta

The tool most often recommended for multilingual use cases, with transcription support across 58 languages. Notta includes automatic language detection, which means you don't have to configure language settings before every meeting. It also supports bilingual mode for 11 language pairs — allowing two-language meetings to be captured more cleanly.

In single-language meetings in major languages, Notta performs well. The automatic detection feature is genuinely useful for teams with mixed-language environments. The limits appear in heavy code-switching scenarios and in languages that sit outside its training strengths.

Pricing: Free tier with limited minutes. Pro around $13.49/month.

Otter.ai

Otter supports five languages: English (US and UK), Japanese, Spanish, and French. For teams working outside those languages, it is not a viable option. Within its supported languages, Otter's English transcription is strong, and its real-time collaborative features and meeting platform integrations are among the best available. But the language ceiling is real.

Pricing: Free tier available. Pro around $16.99/month.

Google Meet / Google Workspace

Google's transcription capabilities benefit from Google's language data advantage. Auto-detection works reasonably for clear audio in major world languages. The constraint is that it's tightly integrated with Google's own meeting platform — if your team uses other tools, the integration story gets complicated quickly.

MinuteKeep

MinuteKeep supports 9 languages: English, Japanese, Korean, German, French, Spanish, Portuguese, Arabic, and Chinese. It uses OpenAI's GPT-4o-based transcription models, which include the multilingual training underlying both the original Whisper architecture and subsequent refinements.

One important limitation to be transparent about: MinuteKeep does not automatically detect the spoken language or handle mid-meeting language switching. You select the primary language before recording begins, and the transcription assumes that language throughout. If your meeting involves significant code-switching, you will need to decide which language is primary and accept that the other language's segments may have lower accuracy.

What MinuteKeep does offer for multilingual users is a feature that most tools don't highlight: the ability to transcribe in one language and generate the summary in a different one. That capability is worth its own explanation.

The Cross-Language Summary Trick

Here is a workflow that multilingual remote workers find genuinely useful, once they know it exists.

Suppose your meeting happens in Japanese. The transcript is in Japanese. But your summary needs to go to a stakeholder who reads English. Or you work across time zones and your personal working language is Spanish, but the team meeting is in German.

In MinuteKeep, the transcription language and the summary output language are independent settings. You can record in Japanese, transcribe in Japanese, and receive the summary in English. Or record in German and receive the summary in Spanish. Any combination of the nine supported languages works.

This separation matters because it means your notes are usable immediately, by the right audience, without requiring translation as a separate step. The AI reads the full transcript in the original language and generates a structured summary in the language you've selected for output. The meaning transfer happens inside the model — not via a separate translation layer applied afterward.

For multilingual team leaders who run meetings in one language but report upward in another, this is not a minor convenience. It eliminates a step that would otherwise require either a human translator, a separate translation tool, or accepting that something gets lost between transcript and report.

Ready to Try It?

MinuteKeep is free to download, with 30 minutes of transcription included from the first session. No subscription — you add time when you need it.

Download MinuteKeep on the App Store

Tips for Better Multilingual Transcription Results

Regardless of which tool you use, these practices improve accuracy in multilingual environments:

Set the language deliberately. If you're using a tool that requires language pre-selection (like MinuteKeep), choose the language that will be used for the majority of the meeting. Don't guess — ask meeting participants in advance if needed. A wrong language setting produces substantially worse output than the right one.

Reduce code-switching when possible. This is not always practical, but for important meetings where an accurate record matters, establishing a single primary language for the session will produce cleaner transcripts. Code-switching is natural in informal conversation; a formal meeting where someone is taking an official record benefits from consistency.

Use a custom dictionary for recurring terms. Any proper nouns, product names, or technical terms that appear regularly in your meetings should be added to your custom dictionary. This applies especially to terms that cross linguistic contexts — English brand names in a Japanese meeting, for example, or company-specific terminology that doesn't exist in any training corpus. See the full guide to custom dictionaries for AI transcription for setup instructions.

Record in a quiet environment with good microphone placement. Noise tolerance degrades faster for non-English languages than for English, because models have seen less variety of non-English audio during training. A clean recording makes a larger difference in accuracy for non-English languages than most users expect.

Use the summary as your working document. Transcripts are raw output — every filler word, correction, and cross-talk moment is captured. For multilingual meetings especially, the AI summary is more useful because it interprets meaning rather than copying words. Verify decisions in the raw transcript; rely on the summary for communication.

Understand the full pipeline. For a deeper explanation of how transcription and summarization work together, see the guide to how AI meeting transcription works under the hood.

Frequently Asked Questions

Can MinuteKeep transcribe a meeting where participants speak different languages?

Yes, with a caveat. MinuteKeep can transcribe audio in any of its nine supported languages, but you must select one primary language before recording. If participants switch between languages during the meeting, the accuracy of the non-selected language's segments will be lower. For meetings where language switching is significant, choose the language that covers the majority of the content.

Does the summary language have to match the transcription language?

No. This is one of MinuteKeep's most useful features for multilingual teams. You can transcribe in one language and receive the summary in a different language. The settings for transcription language and summary output language are independent.

What happens if I select the wrong language before recording?

The transcript will have elevated error rates — the model applies the wrong language's phonetic rules to the audio. You cannot change the transcription language after recording. If you realize mid-meeting that you selected incorrectly, finish the meeting and use the summary to identify where errors are likely concentrated.

Does the custom dictionary work across languages?

Yes. Dictionary entries apply to the transcription output regardless of the source language. If you add an English product name that appears in Japanese meetings, the dictionary will catch and correct the phonetic approximation the model produces when it hears that name in Japanese audio. This is particularly useful for brand names, technical terms, and proper nouns that appear in multiple language contexts.

Is code-switching going to be supported automatically in the future?

The technology is improving. Research published in 2025 shows end-to-end multilingual architectures reducing word error rates at language boundaries by up to 55% compared to earlier approaches. But production-ready automatic code-switching remains an active research area. Expect incremental improvement rather than a sudden capability jump.

Key Takeaways

About 40% of the global workforce regularly attends meetings in a non-native language. Most transcription tools were not designed with these users in mind.
Code-switching is the hardest problem for current AI transcription. Word error rate at language boundaries can spike 30–50 percentage points; even multilingual models like Whisper are not immune.
Whisper and GPT-4o-based successors are trained on multilingual data, but performance is uneven. High-resource languages perform well; low-resource languages and regional dialects can be significantly worse.
Notta offers 58 languages and automatic detection. Otter.ai supports five languages and is not suitable for teams outside that set.
MinuteKeep supports 9 languages, requires language selection before recording, and does not auto-detect mid-meeting switching.
MinuteKeep's cross-language summary feature — transcribe in one language, summarize in another — removes a translation step for teams that meet and report in different languages.
Custom dictionary, clean audio, and deliberate language selection are the three most reliable accuracy improvements regardless of tool.