Topics

Live transcription or batch — which one are you actually trying to build a habit around?

Live captions in a meeting are a different product than 'I'll read it after.' We pick batch on purpose; here's the reasoning.

Product

Decisions

Roadmap

Finn GlasCo-Founder + Engineering

·June 6, 2026·

1 min read

On this page

Two completely different jobs What you give up when you go live What batch buys you in return When we'd revisit

Two completely different jobs

Live transcription is for: hearing-accessibility, real-time captions in a meeting, courtroom live records, broadcast subtitles. Batch transcription is for: voice notes, recorded interviews, lectures you'll review, dictation that becomes prose. Trying to do both well in one product usually means doing one badly.

What you give up when you go live

Live transcription requires streaming the audio to the engine as it's captured. That means: server-side processing of audio that hasn't finished being recorded yet, partial-result accumulators that have to be reconciled when the speaker pauses, and accuracy that's structurally lower than batch because the model can't 'look ahead' the way batch decoders can. Vosk supports live; we tried it; the accuracy hit at the noise levels real users actually record at (phones, public spaces, kitchens) wasn't worth it.

What batch buys you in return

Cheaper hosting (no streaming inference, just discrete jobs). Higher accuracy (decoder can look at the whole utterance). Lower complexity in the FE (no partial-result accumulator, no reconnection logic on a flaky connection). Simpler privacy story (the audio file is the unit of work; you can audit, delete, or refuse to upload a single file).

When we'd revisit

Two events would flip our position. One: native browser support for streaming inference (WebGPU + WebAssembly putting Vosk Large directly in the page) - that'd let us run live transcription without a server-side streaming step at all. Two: a strong customer signal from an accessibility-driven use case where 'transcript appears after the recording stops' is genuinely too slow. Until then, batch is the right call.

Share this article

Try Sprachmemo

Free plan, no credit card. We host in Germany. You can export and delete everything self-serve.

Written by

Finn Glas

Co-Founder + Engineering

Finn is one of the Co-Founders. He owns the engineering side, the infrastructure, and most of the late-night fixes that ship before anyone notices.

finn.glas at aicuflow dot comLinkedIn Website