Topics
Live captions in a meeting are a different product than 'I'll read it after.' We pick batch on purpose; here's the reasoning.

Live transcription is for: hearing-accessibility, real-time captions in a meeting, courtroom live records, broadcast subtitles. Batch transcription is for: voice notes, recorded interviews, lectures you'll review, dictation that becomes prose. Trying to do both well in one product usually means doing one badly.
Live transcription requires streaming the audio to the engine as it's captured. That means: server-side processing of audio that hasn't finished being recorded yet, partial-result accumulators that have to be reconciled when the speaker pauses, and accuracy that's structurally lower than batch because the model can't 'look ahead' the way batch decoders can. Vosk supports live; we tried it; the accuracy hit at the noise levels real users actually record at (phones, public spaces, kitchens) wasn't worth it.
Cheaper hosting (no streaming inference, just discrete jobs). Higher accuracy (decoder can look at the whole utterance). Lower complexity in the FE (no partial-result accumulator, no reconnection logic on a flaky connection). Simpler privacy story (the audio file is the unit of work; you can audit, delete, or refuse to upload a single file).
Two events would flip our position. One: native browser support for streaming inference (WebGPU + WebAssembly putting Vosk Large directly in the page) - that'd let us run live transcription without a server-side streaming step at all. Two: a strong customer signal from an accessibility-driven use case where 'transcript appears after the recording stops' is genuinely too slow. Until then, batch is the right call.
Free plan, no credit card. We host in Germany. You can export and delete everything self-serve.
Read next
Why your voice shouldn't transit a US cloud
Why hosting region matters more than the marketing makes it sound.
Read
Vosk vs Whisper on German: an honest field comparison
Two open speech-to-text projects, head to head on German.
Read
GDPR and voice recordings: the rules in plain language
Consent, retention, deletion - how voice fits into GDPR.
Read