BriefGPT - AI 论文速递 ·

遗忘变压器：带遗忘门的Softmax注意力

📝

内容提要

本研究针对传统Transformer模型在长上下文语言建模中性能不足的问题，提出了一种新颖的“遗忘注意力”机制，通过数据依赖的方式对未归一化的注意力得分进行下调，从而构建“遗忘变压器”（FoX）。研究发现，FoX在长上下文任务上优于传统Transformer，并在不需要位置信息的情况下，兼容FlashAttention算法，显著提升了模型在短上下文下游任务的表现。

➡️

继续阅读

开源社区“内战”爆发：Bun 创始人预言“未来将禁止人类贡献”，硅谷大佬纷纷站队！
本文永久链接 – https://tonybai.com/2026/05/01/open-source-civil-war-bun-founder-pre...
在Kubernetes中管理Valkey集群
Over the last several years, Percona has introduced several rock-star Kuberne...
The craziest part of Musk v. Altman happened while the jury was out of the room
Okay, I am not a lawyer so I only understood about half of what just happened...
网友吐槽：OpenClaw又触发了Claude Code当场翻脸还扣钱！
有趣的是，如果你最近的提交中在 JSON 数据块里提到了 OpenClaw，Claude Code 要么会拒绝你的请求，要么会额外收费。一句“openc...
Christophe Pettus: On pgvectorscale, and Hybrid Search Without an Elasticsearch Sidecar
pgvector is excellent. It is also, at large scale, expensive — because the HN...
保罗·梅尔基奥雷：Posette 2026
Posette 2026是一个免费的虚拟开发者活动，专注于PostgreSQL生成列的应用与演变。活动将通过实际案例探讨生成列的性能、存储和查询行为，并结...