
London-based Trint, a startup co-founded by Emmy-winning journalist Jeff Kofman, is tackling a paint-point I know all too well: the time it takes to transcribe an interview (or any audio) accurately.
To solve this particular problem the company is employing machine learning and speech-to-text technology to automate transcribing, but — perhaps crucially — outputting the result in a user interface that recognises that automation typically only gets the job partly done.
Specifically, Trint integrates a web-based audio/video player and text editor, with the outputted automated transcription synced to the audio player’s playhead. It’s a deceptively simple idea but one that makes a ton of difference when checking (and editing) a transcription for accuracy.
“We glue the text output of automated speech-to-text to the original source audio. And that means that you can follow it like karaoke,” explains Kofman via an audio file transcribed by Trint. “And because it’s an editor, if you have a word like Muammar Gaddafi, a name like that that’s not correctly transcribed, then you can fix it. And in seconds you’ve got the moment you need and…
 
					