BriefGPT - AI 论文速递 ·

双语语料库挖掘和多阶段微调以提升讲座文稿机器翻译

💡 原文中文，约200字，阅读约需1分钟。

📝

内容提要

该研究提出了一种新的无监督方法，使用单语数据来获得跨语言句子嵌入，产生合成平行语料库，并使用预训练的跨语言掩码语言模型对其进行微调以得到多语言句子表示。结果表明，该方法可以比基准模型获得高达22个F1点的改进，并且单个合成的双语语料库能够改善其他语言对的结果。

🎯

关键要点

该研究提出了一种新的无监督方法，使用单语数据获得跨语言句子嵌入。
该方法产生了合成平行语料库，并使用预训练的跨语言掩码语言模型进行微调。
在两个平行语料库挖掘任务上评估了表示的质量。
结果显示，该方法比基准XLM模型获得高达22个F1点的改进。
单个合成的双语语料库能够改善其他语言对的结果。

🏷️

标签

多语言句子表示平行语料库微调无监督方法机器翻译语料库跨语言句子嵌入跨语言掩码语言模型

➡️

继续阅读

Wolves, sheep, and gypsies
In 2012, the first Danish wolf in nearly two hundred years was discovered in ...
Issue #744: CPython ABI, CLAUDE.md, Itertools Cheatsheet, and More (2026-07-21)
#744 – JULY 21, 2026 View in Browser » What Every Dev Should Know About t...
Announcing the Public Preview of Discover and Domains, powered by Unity Catalog
Today, we're announcing the Public Preview of Domains and the Discover pa...
Android Studio Quail 2 Redesigns Agent Mode, Streamlines AI-Assisted Coding
The latest release of Android Studio, Quail 2, now stable, expands Gemini/AI ...
Peak Design’s modular Field Bracket has a finder tag built-in
I am a very clumsy man. So clumsy, that I have AirTags hanging off practicall...
Nearly every Kindle is steeply discounted at Best Buy
If you’ve been thinking about picking up a Kindle before school starts, or fo...