AV-CPL: Continuous Pseudo-Labeling for Audio-Visual Speech Recognition
📝
内容提要
Audio-visual speech contains synchronized audio and visual information that provides cross-modal supervision to learn representations for both automatic speech recognition (ASR) and visual speech...
➡️