BriefGPT - AI 论文速递 ·

FarsEval-PKBETS: A New Diverse Benchmark for Evaluating Persian Large Language Models

💡 原文英文，约100词，阅读约需1分钟。

📝

内容提要

本研究提出FarsEval-PKBETS基准，包含4000个多样化问题，旨在评估波斯语大型语言模型的性能。测试结果显示，现有模型的平均正确率低于50%，表明其在复杂波斯语任务中存在显著能力缺陷。

🎯

🏷️

What’s new: Air gets more agents, local models, and Java/Kotlin code intelligence
The new release of JetBrains Air brings support for GitHub Copilot, OpenCode,...
Google ships 3 new Gemini models. Just not the one everyone’s waiting for.
Google on Tuesday launched three new Gemini models: Gemini 3.6 Flash, a cheap...
Google launches a cheaper alternative to large AI security models like Mythos
Google is launching Gemini 3.6 Flash alongside a new security model dedicated...
Inside Roblox’s Bet on World Models
We sat down with Anupam Singh, senior vice president of engineering at Roblox...
阿里团队自研 AOQ 协议，为多模态 AI 构建确定性传输底座
随着大模型向多模态全面演进，AI 应用正从云端走向终端。端侧公网“最后一公里”的网络波动与 AI 推理所需要海量数据的实时传输需求之间，存在较大的冲突，会...
台积电拟于2027年最高提价10%；苹果拟推出设备租赁计划以提振销量；2026年《财富》中国500强发布
（全球TMT 2026年07月22日讯）今日要点：台积电拟于2027年最高提价10%；三星电子规划未来5年在韩 […]