BriefGPT - AI 论文速递 ·

MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations in Large Language Models

💡 原文英文，约100词，阅读约需1分钟。

📝

内容提要

本文提出了MedHallu基准，用于检测大语言模型在医疗问答中的幻觉问题。基准包含来自PubMedQA的10,000对问答，研究表明现有模型在幻觉检测上存在不足，引入领域知识和“无确定答案”选项可显著提高检测精度。

🎯

🏷️

Why China is giving away its best AI models
Silicon Valley has spent much of the past week on red alert, digesting the ar...
Microsoft Releases .NET 11 Preview 6 with Language and Framework Updates
Microsoft has released .NET 11 Preview 6, with updates across C#, ASP.NET Cor...
How NVIDIA Builds Open Models for the Age of AI
Bryan Catanzaro, VP of Applied Deep Learning Research at NVIDIA, walked us th...
The EU Digital Product Passport: a traceability deadline
Informational only, not legal advice. Confirm all regulatory details against the...
深部电极埋入大脑发现：词语组织是一套独立程序，与词义无关
大脑每秒处理110亿比特信息，语言占的那点份额少得可怜。每个人天生就带着一套深层组词能力，不是学校教的，也不是父母教的，是大脑自己从环境里“偷”出来的。 ...
LoHoSearch 开源后，搜索智能体评测该往真实任务靠一靠了
美团开源 LoHoSearch，把搜索智能体评测从刷高分拉回到复杂任务和证据链上。对工程团队来说，重点不是模型会不会搜索，而是它在真实查询、外部依赖、成本...