Diarization Mode

Diarization determines how the system identifies and handles multiple speakers.

Action Phrase offers two diarization modes:

  • SpeakerManager
  • SortFormer

They solve different problems — and understanding the tradeoffs is important.

SpeakerManager

SpeakerManager is designed for identifying a single known speaker.

Key Characteristics

  • Supports one enrolled voice profile
  • Recognizes that specific person’s voice
  • Best when only one person is speaking at a time

Voice Enrollment

SpeakerManager allows you to:

  • Record and store one voice profile.
  • Improve recognition reliability for that person.
  • Reduce accidental triggers from unrelated voices.

Important Limitation

SpeakerManager performs best when:

  • Only one person is speaking.

It is not designed for overlapping speech. If multiple voices are speaking simultaneously, recognition accuracy for the enrolled voice may decrease.


SortFormer

SortFormer is a transformer-based diarization model designed for multi-speaker environments.

Key Characteristics

  • Handles overlapping speech more effectively
  • Better at isolating a calibrated voice when others are speaking
  • Does not support stored voice profiles
  • Includes optional Silence Detection

Calibrated Voice Recognition

SortFormer can still identify and prioritize the calibrated voice, even when other voices are present — often more reliably than SpeakerManager in overlapping scenarios.

However:

  • You cannot store persistent voice profiles.
  • Enrollment-style voice storage is not supported.
  • It relies on real-time modeling rather than saved identity profiles.

Silence Detection

When enabled, Silence Detection:

  • Identifies pauses between speech segments
  • Helps finalize transcript segments more cleanly
  • Improves responsiveness during natural breaks

results matching ""

    No results matching ""