INFLXD MediaSubscribe →
AI and Data

Why 6% word error rate is the wrong number for finance transcripts

Standard ASR benchmarks miss the tokens that actually move an investment thesis.

INFLXD Research··4 min read
Why finance transcripts fail where it matters most

Hedge fund analysts pay USD 1,000 to USD 1,500 per expert call. The transcript usually lands 4 to 12 hours later, carrying a 5 to 12 percent word error rate from a standard ASR engine. On the surface, 6 percent sounds tolerable. On a finance call, it isn't.

The errors don't distribute evenly. They cluster in the segments an analyst was paying to hear: figures, tickers, company names, and the acronyms that anchor a thesis.

What the benchmarks actually measure

Deepgram, AssemblyAI, and OpenAI publish WER against public speech corpora (LibriSpeech is read audiobook prose; VoxPopuli is European Parliament proceedings). Both are clean, scripted, and English-dominant. Neither contains a single earnings call, expert call, or sell-side analyst Q&A.

A model that posts 5 percent WER on LibriSpeech can post 9 to 12 percent on a one-hour expert call where a former semiconductor exec is talking through TSMC's N3 ramp, HBM allocation, and segment ASPs. The model wasn't trained on the words that carry the meaning.

Where the errors land

On a 60-minute call at 150 words per minute, you get roughly 9,000 spoken tokens. A 6 percent WER means about 360 of them are wrong or missing. Most are filler, articles, or hedges ("sort of," "you know," "I think"). Losing those is fine.

The problem is the other 50 to 80. Those are the numbers, tickers, and acronyms. Examples we see repeatedly in raw ASR output on finance audio:

  • TSMC transcribed as "DSMC," "TSMC" with lowercase, or "T-S-M-C" split across tokens.
  • EBITDA rendered as "e-bit-da" or "a bit dah," sometimes silently swapped for EBIT.
  • Basis points ("200 bps") flattened to "200 BPs" or "two hundred bips."
  • Tickers ($NVDA, $AMD) dropped entirely or written as "NVIDIA" with no ticker context.
  • Segment numbers ("data center grew 154 percent year over year") with the wrong digit, or YoY transcribed as "yoy."

A research analyst building a model from the transcript catches some of these on a re-read. They don't catch all of them. The ones they miss end up in the IC memo.

The compliance dimension

Expert networks (Guidepoint, GLG, Third Bridge, AlphaSense and Tegus) run compliance review on every transcript before delivery. The reviewer's job is to flag MNPI, not to fix "DSMC." Quality of the underlying transcription is the vendor's problem. If the ASR layer drops a number or mangles a ticker, the analyst gets a clean-looking, compliance-stamped document that is quietly wrong on the data points they paid for.

This is the asymmetry: the transcript looks finished. The errors are invisible until someone tries to use the document as a source.

Why finance WER is a separate metric

The right comparison isn't "how does our model do on LibriSpeech." It's "how does our model do on a 60-minute call where the speaker says EBITDA 14 times, TSMC 9 times, and quotes three different segment growth rates."

We haven't seen any of the major ASR vendors publish that number. The vendors that quote a single WER are quoting the number that flatters them. Buyers in expert networks, hedge fund research desks, and financial data providers should be asking for a domain-specific WER, measured on a finance corpus, with separate scores on numbers, tickers, and named entities.

Until that becomes standard, the published 5 to 6 percent figures should be read as a floor, not a ceiling, on what an analyst will actually encounter.

What to ask an expert next

  1. On a typical 60-minute expert call you've done, how often have you seen a number or ticker mangled in the delivered transcript?
  2. Does your firm ever cross-check the transcript against the audio, and at what point in the workflow?
  3. Have you ever caught an MNPI-adjacent error that came from a transcription mistake rather than something the expert actually said?
  4. What would a vendor have to publish for you to switch transcription providers?
  5. Do your PMs read transcripts directly, or do analysts summarize first? Where in that chain does a transcription error compound?

Disclosure: Drafted with AI assistance and reviewed by INFLXD editors against the newsroom's editorial rubric. Source links above are the primary factual basis for every claim.

Position B disclosure: INFLXD has commercial relationships with one or more of the companies named in this article. See our editorial disclosures.

From INFLXD

Powering institutional-grade transcription for expert networks.

INFLXD provides AI-powered, human-edited transcription with sub-1% error rates for the world's leading expert networks and financial research firms.

Visit inflxd.com →