plus studio ·

GPU部署llama-cpp-python(llama.cpp通用)

💡 原文中文，约1600字，阅读约需4分钟。

📝

内容提要

本文介绍了在Ubuntu 20.04上部署llama-cpp-python的流程，使用Python 3.8.10和CUDA 11.6。首先确认CUDA已安装，并通过命令安装cuBLAS加速后端。运行时设置参数如n_threads和n_gpu_layers以优化GPU运算。多卡测试显示，使用两张Tesla T4显卡可快速推理70B模型，并提供了常见错误及解决方案。

🎯

关键要点

在Ubuntu 20.04上部署llama-cpp-python，使用Python 3.8.10和CUDA 11.6。
确认CUDA已安装，通过命令nvcc -V检查CUDA版本。
使用cuBLAS加速后端，安装命令为：export LLAMA_CUBLAS=1 CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python。
运行时需要设置参数n_threads和n_gpu_layers以优化GPU运算，n_threads代表使用的线程数，n_gpu_layers代表在GPU上运算的层数。
多卡测试显示，使用两张Tesla T4显卡可在约半分钟内推理70B模型。
常见错误包括CUDA编译器未找到、CUDA版本过低、以及GPU名称未定义等，需根据错误信息进行相应解决。

❓

延伸问答

如何在Ubuntu 20.04上部署llama-cpp-python？

在Ubuntu 20.04上部署llama-cpp-python需要使用Python 3.8.10和CUDA 11.6，首先确认CUDA已安装，然后通过特定命令安装cuBLAS加速后端。

如何检查CUDA是否已安装？

可以通过命令nvcc -V来检查CUDA是否已安装，如果输出CUDA编译器信息，则表示已安装。

在运行llama-cpp-python时需要设置哪些参数？

需要设置n_threads和n_gpu_layers参数，n_threads代表使用的线程数，n_gpu_layers代表在GPU上运算的层数。

使用多张显卡时需要注意什么？

使用多张显卡时，确保torch.cuda.is_available()和torch.cuda.device_count()正常即可，测试显示两张Tesla T4显卡可快速推理70B模型。

常见的CUDA错误有哪些？

常见错误包括CUDA编译器未找到、CUDA版本过低、以及GPU名称未定义等，需根据错误信息进行相应解决。

如何安装cuBLAS加速后端？

安装cuBLAS加速后端的命令为：export LLAMA_CUBLAS=1 CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python。

🏷️

继续阅读

读：Python 延迟——time.sleep() 不是万能的
在Python中，time.sleep()在普通脚本中使用方便，但在多线程、异步编程和GUI中会导致阻塞。多线程应使用Event.wait()，异步代码应...
Why Zig Isn’t 1.0 (Yet)
Most programming languages follow a familiar trajectory: early experimental r...
Why isn’t the Trump phone made in the USA?
Where's the Trump phone? We're going to keep talking about it every w...
This chunky little tablet got my kid to clean up his toys
Never underestimate the power that a cheap tablet holds over a kid under six....
Your AI bill is out of control. Cloudflare can fix it now.
AI Gateway now features real-time spend limits to prevent runaway token bills...
Row vs Columnar Storage for Analytics: Why PostgreSQL Scans Are Slower Than They Should Be
Learn why PostgreSQL reads 16x more data than your queries need, and how a hy...