AI RESEARCH

Advanced Prompt Engineering for Automatic Speech Recognition [P]

r/MachineLearning

I've been working on the listening part of my full-duplex speech model MichiAI and I realized that ASR prompting could be very useful. Deepgram allows for word boosting but that doesn't work that well in real word applications. Other thing that is missing is feeding a whole conversation history as context to the ASR model. This could be very useful for voice agents. TLDR, during the testing I realized the model can be fine tuned for prompting with text like: Expect a license plate (3 letters, 3 numbers). For example ABC123. <|start|> or Expect a person's name.