BriefGPT - AI 论文速递 ·

重新思考通道维度，以隔离大型语言模型低比特权重量化中的异常值

💡 原文中文，约500字，阅读约需2分钟。

📝

内容提要

本文提出了 per-IC 量化和 AdaDim 两种基于权重的量化方案，以解决大型语言模型在小批量推断设置下的内存瓶颈问题。AdaDim 在基础的语言建模基准测试和指导性调优的 LLMs 中都取得了显著的改进效果。

🎯

关键要点

大型语言模型（LLMs）在小批量推断设置下面临内存瓶颈问题。
提出了基于权重的量化方案以解决内存瓶颈。
sub-4 bit 量化存在激活异常值的挑战。
per-IC 量化方法在每个输入通道内创建量化组，效果显著。
AdaDim 是一种多功能量化框架，适应各种权重敏感性模式。
AdaDim 在基础语言建模基准测试和指导性调优的 LLMs 中取得显著改进效果，MMLU 提升最高 +4.7%，HumanEval 提升最高 +10%。

🏷️

标签

AdaDim LLMs 内存瓶颈大型语言模型语言模型量化方案

➡️

继续阅读

Built in Fort Worth: Wistron Opens Advanced Manufacturing Plant to Produce NVIDIA AI Systems
The AI era runs on AI infrastructure. Many of these advanced systems are buil...
Neill Blomkamp’s new zombie AI ‘film’ is just slop warmed over
On Monday, District 9 and Gran Turismo director Neill Blomkamp unveiled his l...
Towards a Theory of Bugs: The Ruliology of the Unexpected
“My Program Did the Wrong Thing!” Bugs are a ubiquitous phenomenon in the sof...
OpenAI says it accidentally hacked Hugging Face with a new AI system
OpenAI says its AI models mistakenly breached open-source AI platform Hugging...
谷歌Gemini 3.6 Flash发布：输出token暴降17%，价格战打到了七块五
谷歌AI模型更新引爆价格战，谁还敢说Flash系列只是“快枪手”？ Google一口气甩出三款新模型，直接把AI价格战打到了每百万token七块五毛钱，这...
A digestion of the Jacobian conjecture counterexample
The notorious Jacobian conjecture can be formulated concretely over the compl...