提升视觉语言模型的链式思维推理
Chain-of-thought (CoT) reasoning in vision language models (VLMs) is crucial for improving interpretability and trustworthiness. However, current training recipes often relying on datasets...
链式思维推理在视觉语言模型中至关重要,但现有训练方法依赖短注释,导致推理泛化差。本文提出两阶段后训练策略:首先用GPT-4o生成增强短答案,然后利用短答案作为强化学习的奖励,优化模型推理。实验表明,该方法显著提升了推理能力和答案预测的泛化性。
