语音搜索错误纠正的音素增强判别重评分

End-to-end (E2E) Automatic Speech Recognition (ASR) models are trained using paired audio-text samples that are expensive to obtain, since high-quality ground-truth data requires human annotators....

本文提出了一种针对E2E自动语音识别模型在新电影标题识别中不足的音素纠正方法。该方法通过音素搜索生成替代选项，并结合ASR模型的识别结果，显著提高了识别准确率，错误率降低了4.4%至7.6%。

E2E模型识别准确率语音识别错误率音素纠正