闪存注意力笔记
原文英文,约300词,阅读约需1分钟。发表于: 。Backgroud There are two common kinds of bound which limited the speed of training in deep learning. Memeory-bound: time spent on memeory-access is bottlenecked Computation-bound: time spent on...
闪存注意力介绍了深度学习中训练速度的两大限制:内存和计算。通过分块计算Q、K、V,避免存储大规模softmax中间矩阵,从而提高内存效率。这一方法加速模型训练,提升长序列任务的质量,且在速度和内存效率上优于现有方法。