vLLM Router: A High-Performance and Prefill/Decode Aware Load Balancer for Large-scale Serving
📝
内容提要
Efficiently managing request distribution across a fleet of model replicas is a critical requirement for large-scale, production vLLM deployments. Standard load balancers often fall short as they...
🏷️
标签
➡️