Cleaning Up Speech Recognition with GPT
If you prefer an existing webapp over an elisp function, https://huggingface.co/spaces/ndurner/oai_chat. Choose Whisper as the model, upload your 25 MB chunks, hit Send, choose GPT-4 Turbo, ask it to clean up, hit Send. Then, hit the Download button (hidden away on the very bottom).
Helpful fact: Whisper works on 16 KHz sampling behind the scenes, so can make your recording smaller by downsampling to 22 KHz, mono. AAC is supported, and commenters to the web say Whisper is pretty robust so it doesn‘t have do be hi-fi - just so that you can make a split at the beginning of the QA session perhaps, if you can‘t fit it into one 25 MB chunk right away.
I've tried using LLMs to restore and clean up raw unpunctuated transcripts, but they tend to hallucinate new words. And chunking is an issue since long transcripts need to be split according to the LLM input context size. But where do we split if we don't have the punctuation yet?
In Scribe I chose instead to restore punctuation marks using a token-classifier (here a DistilBert running in your browser) https://www.appblit.com/scribe
Laurent
I take a lot of long voice notes thinking out loud during my walks. I use this prompt when I give the Whisper output to GPT-4. I've found it to be pretty reliable:
> Punctuate the following transcript of a voice note I took. Insert periods, commas, and paragraph breaks where appropriate. Remove filler words such as 'right?' 'you know', and 'uh'. But do retain my original wording! Do not paraphrase my sentences beyond recognition: this is not a rewriting task! The transcript now follows:
Use whisper (or whisperx)
"I’ll probably write an elisp function to send the output to GPT and insert the results into the buffer"
I suggest taking a look at my LLM command-line tool. It's great for cobbling together these kinds of things, because you can use whatever shell integration your environment has to pipe things to it.
Should be easy to call that from elisp, and you can then install plugins to have it talk to other models like Claude or Llama 3: https://llm.datasette.io/en/stable/plugins/directory.html