小红花·文摘 - 小红花技术领袖俱乐部

本研究提出了一种新方法，通过隐式奖励从英文模型获取偏好，并将其迭代训练转移到其他语言，从而有效提升多语言模型性能，减少对多语言偏好数据的需求。

An Efficient Implicit Cross-Language Reward Mechanism for Multilingual Preference Alignment

BriefGPT - AI 论文速递 ·

Introducing UNA: A Unified Alignment Framework Integrating the Advantages of RLHF, DPO, and KTO

Introducing UNA: A Unified Alignment Framework Integrating the Advantages of RLHF, DPO, and KTO

机器之心 ·