小红花·文摘 - 小红花技术领袖俱乐部

750B MoE 模型从自建 RoCE 集群迁移至 AWS EFA：Prefill-Decode 分离推理的通信架构验证

750B MoE 模型从自建 RoCE 集群迁移至 AWS EFA：Prefill-Decode 分离推理的通信架构验证

亚马逊AWS官方博客 ·

本文探讨了大模型推理的工程差异，强调训练与推理的不同需求。推理分为Prefill和Decode两个阶段，前者关注计算吞吐，后者关注延迟。KV Cache的使用显著提高了推理效率，减少了计算复杂度。文章还介绍了Continuous Batching和Prefill/Decode分离的优势，强调了高并发场景下的显存管理和性能优化策略。

【大模型基础设施工程】11：推理引擎基础

土法炼钢兴趣小组的博客 ·

本文探讨了大模型应用中PD分离部署的必要性，分析了Prefill与Decode阶段的资源需求差异，建议将两者部署在不同设备上以优化性能。同时介绍了vLLM的连接器和部署过程，强调了缓存共享与负载均衡的重要性。

vLLM 部署 PD 分离应用

陈少文的博客 ·

DolphinGemma, a large language model developed by Google, is helping scientists study how dolphins communicate — and hopefully find out what they're saying, too.

DolphinGemma: How Google AI is helping decode dolphin communication

Google DeepMind Blog ·

DolphinGemma, a large language model developed by Google, is helping scientists study how dolphins communicate — and hopefully find out what they're saying, too.

DolphinGemma: How Google AI is helping decode dolphin communication

Google DeepMind Blog ·

DolphinGemma, a large language model developed by Google, is helping scientists study how dolphins communicate — and hopefully find out what they're saying, too.

DolphinGemma: How Google AI is helping decode dolphin communication

Google DeepMind Blog ·

DolphinGemma, a large language model developed by Google, is helping scientists study how dolphins communicate — and hopefully find out what they're saying, too.

DolphinGemma: How Google AI is helping decode dolphin communication

Google DeepMind Blog ·

DolphinGemma, a large language model developed by Google, is helping scientists study how dolphins communicate — and hopefully find out what they're saying, too.

DolphinGemma: How Google AI is helping decode dolphin communication

Google DeepMind Blog ·

DolphinGemma, a large language model developed by Google, is helping scientists study how dolphins communicate — and hopefully find out what they're saying, too.

DolphinGemma: How Google AI is helping decode dolphin communication

Google DeepMind Blog ·

DolphinGemma, a large language model developed by Google, is helping scientists study how dolphins communicate — and hopefully find out what they're saying, too.

DolphinGemma: How Google AI is helping decode dolphin communication

Google DeepMind Blog ·

DolphinGemma, a large language model developed by Google, is helping scientists study how dolphins communicate — and hopefully find out what they're saying, too.

DolphinGemma: How Google AI is helping decode dolphin communication

Google DeepMind Blog ·

SQL中的CASE和DECODE

SQL中的CASE和DECODE

DEV Community ·

LLM推理过程分为Prefill阶段和Decode阶段，Prefill阶段计算密集，Decode阶段生成token。评估指标为TTFT和TPOT，要求90%的请求的TTFT和TPOT值都小于等于0.4s和0.04s。PD分离优化了TTFT和TPOT指标，Prefill阶段限制Batch Size，Decode阶段增大Batch Size。

什么是 PD 分离

陈少文的博客 ·

本研究介绍了DECODE，一种以频域序列建模为主的端到端模型，用于EMRI信号检测。DECODE能够高效处理一年的多通道TDI数据，在信噪比50到120之间实现96.3%的真阳性率和1%的假阳性率。DECODE展示了基于空间的引力波数据分析的潜力。

DECODE：检测极端质量比引力波的扩张卷积神经网络

BriefGPT - AI 论文速递 ·