Subject: Speaker diarization returns 1 speaker on multi-speaker board meeting recordings (speech_model: "best")

Hi AssemblyAI team,

We're transcribing Redwood City School District board meetings for a public transparency project (rcsd.info). We're using the Node.js SDK with speech_model: "best" and speaker_labels: true. Most meetings diarize correctly (7-15 speakers detected), but a specific set of recordings consistently return a single utterance with 1 speaker, even though the audio clearly contains many distinct speakers.

REPRO

SDK call:

  const transcript = await client.transcripts.transcribe({
    audio: audioPath,
    speech_model: 'best',
    speaker_labels: true,
    word_boost: ['Redwood City School District', 'RCSD', ...],
  });

Affected recordings (all publicly available on YouTube):

  https://www.youtube.com/watch?v=inuwqFycW2Q  (2023-04-26, 3h 30m, 34760 words, 1 speaker returned)
  https://www.youtube.com/watch?v=-kyZ0HP3eKE  (2023-04-19, 3h 23m, 32570 words, 1 speaker returned)
  https://www.youtube.com/watch?v=LmFOastk8Js  (2023-02-15, 3h 19m, 32229 words, 1 speaker returned)
  https://www.youtube.com/watch?v=sVxkVRPrIvI  (2023-03-08, 2h 31m, 27019 words, 1 speaker returned)
  https://www.youtube.com/watch?v=uEJsKPqWg_0  (2023-11-02, 1h 58m, 18596 words, 1 speaker returned)

EXPECTED BEHAVIOR

The API should return multiple utterances with distinct speaker labels (A, B, C, etc.). These are board meetings with a president calling roll, a superintendent presenting, trustees discussing, members of the public speaking, etc. The audio quality is decent — single room with a good mic setup. Listening to the recordings, speakers are clearly distinguishable.

ACTUAL BEHAVIOR

The API returns a single utterance containing the entire meeting transcript, attributed to speaker "A". The transcript text itself is accurate — word recognition is fine. It's only the diarization that fails.

WHAT WORKS

144 other meetings from the same YouTube channel, same room, same mic setup, transcribed with identical parameters, diarize correctly with 7-15 speakers. The failing recordings are concentrated in early-mid 2023.

WHAT WE TRIED

- Re-running with --force (re-download + re-transcribe): same result, 1 speaker
- Adding speakers_expected: 10: THIS FIXES IT. 2023-11-02 went from 1 speaker to 10 speakers with identical audio. The only change was adding the speakers_expected hint.

So diarization works correctly when speakers_expected is provided, but fails silently (returns 1 speaker, no error) when omitted — but only for these specific recordings. 139 other meetings from the same channel diarize fine without the hint.

ENVIRONMENT

- assemblyai npm package v4.27.0
- Node.js
- Audio downloaded via yt-dlp from YouTube (opus/48kHz, converted to temp file)

Happy to provide transcript IDs, audio files, or any other debugging info.

Thanks,
David Weekly
david@weekly.org