plus studio ·

ViT在DDPM取代UNet(DiT)

💡 原文中文，约800字，阅读约需2分钟。

📝

内容提要

本文探讨了用ViT替代DDPM中的UNet，提出了Diffusion Transformer-DiT模型。作者训练了四种不同大小的DiT模型，研究了补丁大小、变压器架构和模型规模。模型通过处理补丁序列进行操作，并在设计中加入去噪步数和类别标签，最终输出噪声预测和协方差。

🎯

关键要点

本文探讨了用ViT替代DDPM中的UNet，提出了Diffusion Transformer-DiT模型。
作者训练了四种不同大小的DiT模型：DiT-S、DiT-B、DiT-L和DiT-XL，补丁大小分别为8、4、2。
模型设计空间包括补丁大小、变压器块架构和模型规模。
模型的第一层对补丁序列进行操作，将图像视为由补丁构成的序列。
在获取补丁序列后，需添加去噪步数和类别标签，并在最后一个DiT块后删除。
最终输出为噪声预测和对角协方差预测，形状与模型输入相同。
使用标准线性解码器将输出解码为张量，并重新排列到原始空间布局中。

🏷️

标签

DDPM DiT模型 Diffusion Transformer ViT 去噪

➡️

继续阅读

A Beginner’s Guide to Working with Claude Design
Claude Design is a research preview under Anthropic Labs, powered by Claude O...
Presentation: Parting the Clouds: The Rise of Disaggregated Systems
Murat Demirbas discusses the shift toward disaggregated cloud database archit...
The Economic Benefit of Refactoring
Giles Edwards-Alexander does an experiment to see if decomposing a larg...
Best in Class: Stream PC Games and Study on the Same Laptop With GeForce NOW
Back to school means balancing assignments, deadlines and downtime. GeForce N...
When do AI agents need permission boundaries?
An AI agent feels harmless when it only produces text, but the risk profile c...
Dogfooding at scale: migrating cdnjs to Cloudflare’s Developer Platform
We moved cdnjs, serving 9 billion requests a day, entirely onto Cloudflare...