vLLM Router: A High-Performance and Prefill/Decode Aware Load Balancer for Large-scale Serving

📝

内容提要

Efficiently managing request distribution across a fleet of model replicas is a critical requirement for large-scale, production vLLM deployments. Standard load balancers often fall short as they...

🏷️

标签

➡️

继续阅读