BriefGPT - AI 论文速递 ·

Automatic Evaluation of Healthcare Large Language Models Beyond Question-Answering

💡 原文英文，约100词，阅读约需1分钟。

📝

内容提要

本研究探讨了大语言模型在医疗领域的评估有效性，提出了一种多维度评估套件，揭示开放式与封闭式评估的关系及盲点。研究发布了新的医疗基准CareQA，并引入放松困惑度指标，以克服现有评估方法的局限性。

🎯

🏷️

What’s new: Air gets more agents, local models, and Java/Kotlin code intelligence
The new release of JetBrains Air brings support for GitHub Copilot, OpenCode,...
Google ships 3 new Gemini models. Just not the one everyone’s waiting for.
Google on Tuesday launched three new Gemini models: Gemini 3.6 Flash, a cheap...
Google launches a cheaper alternative to large AI security models like Mythos
Google is launching Gemini 3.6 Flash alongside a new security model dedicated...
Inside Roblox’s Bet on World Models
We sat down with Anupam Singh, senior vice president of engineering at Roblox...
OpenAI and Hugging Face partner to address security incident during model evaluation
OpenAI and Hugging Face share early findings from a security incident during ...
苹果更新TestFlight应用对于参与大量测试的玩家现在可以使用搜索功能
# 软件资讯苹果更新 TestFlight 应用，对于参与大量测试的玩家来说，现在可以使用底部的搜索框快速找到应用。为避免误解所以需要说明，搜索功能仅可...