BriefGPT - AI 论文速递 ·

我们结束了 MMLU 吗？

📝

内容提要

我们检测和分析了流行的大规模多任务语言理解（MMLU）基准测试中的错误，并发现大量的实际误差，使 LLM 的真实能力变得模糊。为了解决这个问题，我们引入了一个全面的框架来识别数据集错误，使用新的错误分类法创建了 MMLU-Redux，它是 30 个 MMLU 主题中，通过手动重新注释的 3,000 个子集问题。通过...

➡️

继续阅读

Next chapter: Restructuring GitHub’s bug bounty program
GitHub is making some significant changes to its bug bounty program, shifting...
How the Galaxy Z Fold 8 and Z Flip 8 phones compare
Samsung's latest round of folding Galaxy Z phones and updated smartwatche...
Preorders for Samsung’s new Z Fold and Flip 8 come with up to $350 in gift cards
Samsung's newest foldables are here. At Galaxy Unpacked, the company anno...
Philips’ new smart toothbrush shows you where you didn’t properly brush
The latest addition to Philips' Sonicare line of smart electric toothbrus...
Microsoft is bringing original Xbox games to PC
Microsoft is expanding its Xbox backward compatibility efforts today by bring...
Not just development, distribution of software may change as well
Even if you are as averse to semver as I used to be in the course of my progr...