Whisper-Flamingo: 集成视觉特征于 Whisper 中用于音频 - 视觉语音识别和翻译
📝
内容提要
Audio-Visual Speech Recognition (AVSR) uses Whisper-Flamingo, a model that integrates visual features, to improve speech recognition and translation performance in noisy conditions for multiple languages.
🏷️
标签
➡️