LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference
📝
内容提要
This paper was accepted at the Efficient Systems for Foundation Models Workshop at ICML 2024 The inference of transformer-based large language models consists of two sequential stages: 1) a...
➡️