BriefGPT - AI 论文速递 ·

多语言大型语言模型的高效有效词汇扩展

💡 原文中文，约500字，阅读约需2分钟。

📝

内容提要

KMMLU是一个新的韩语基准，包含35,030个专家级多项选择题。测试发现，目前的韩语LLMs表现较差，最好的模型准确率为50.54％。KMMLU提供了正确的工具来追踪韩语LLMs的改进。数据集已在Hugging Face Hub上公开，并整合到EleutherAI的语言模型评估工具中。

🎯

关键要点

KMMLU是一个新的韩语基准，包含35,030个专家级多项选择题，涵盖人文学科到STEM学科。
KMMLU的问题来自原始韩语考试，捕捉了韩语的语言和文化方面。
测试了26个公开和专有LLM模型，发现有显著的改进空间。
最好的公开模型在KMMLU上的准确率为50.54%，低于人类平均表现62.6%。
当前适用的韩语LLMs表现较差，例如Polyglot-Ko。
即使是最强大的专有LLMs，如GPT-4和HyperCLOVA X，准确率也仅为59.95%和53.40%。
KMMLU提供了追踪韩语LLMs改进的工具，数据集已在Hugging Face Hub上公开，并整合到EleutherAI的语言模型评估工具中。

🏷️

标签

KMMLU 专家级多项选择题多语言大型语言模型模型准确率韩语LLMs 韩语基准

➡️

继续阅读

A Beginner’s Guide to Working with Claude Design
Claude Design is a research preview under Anthropic Labs, powered by Claude O...
Presentation: Parting the Clouds: The Rise of Disaggregated Systems
Murat Demirbas discusses the shift toward disaggregated cloud database archit...
The Economic Benefit of Refactoring
Giles Edwards-Alexander does an experiment to see if decomposing a larg...
Best in Class: Stream PC Games and Study on the Same Laptop With GeForce NOW
Back to school means balancing assignments, deadlines and downtime. GeForce N...
When do AI agents need permission boundaries?
An AI agent feels harmless when it only produces text, but the risk profile c...
Dogfooding at scale: migrating cdnjs to Cloudflare’s Developer Platform
We moved cdnjs, serving 9 billion requests a day, entirely onto Cloudflare...