小红花·文摘

Building a RAG Pipeline with llama.cpp in Python

MachineLearningMastery.com ·

无法加载共享库 'llama.dll': 找不到 (llama-cpp-python)

DEV Community ·

用 Ollama？其實你在跑 llama.cpp！學會直接使用它，發揮更強性能！

DEV Community ·

Jan v0.5.15：对llama.cpp设置的更多控制、先进的硬件控制及更多功能

DEV Community ·

SGLang与Llama.cpp的快速速度测试

DEV Community ·

本研究提出了优化推理系统Bitnet.cpp，解决了三元大型语言模型在边缘推理中的效率问题。该系统采用新型混合精度矩阵乘法库，实现了高效无损推理，速度比全精度快6.25倍，推动了该领域的发展。

Bitnet.cpp: Efficient Edge Inference for Ternary Large Language Models

BriefGPT - AI 论文速递 ·

使用OpenWebUI和Llama.cpp实现DeepSeek-R1工具调用以构建本地AI工作流程

DEV Community ·

§ C++11 NOTE: this is a web “mirror” of Anthony Calandra’s modern-cpp-features shared under MIT License (see at bottom). The only reason I do a copy is I hate reading markdowns from github....

modern cpp features - C++11

shrik3 ·

§ C++14 NOTE: this is a web “mirror” of Anthony Calandra’s modern-cpp-features shared under MIT License (see at bottom). The only reason I do a copy is I hate reading markdowns from github....

modern cpp features - C++14

shrik3 ·

§ C++17 NOTE: this is a web “mirror” of Anthony Calandra’s modern-cpp-features shared under MIT License (see at bottom). The only reason I do a copy is I hate reading markdowns from github....

modern cpp features - C++17

shrik3 ·

§ C++20 NOTE: this is a web “mirror” of Anthony Calandra’s modern-cpp-features shared under MIT License (see at bottom). The only reason I do a copy is I hate reading markdowns from github....

modern cpp features - C++20

shrik3 ·

§ Modern C++ Features (Anthony Calandra), overview C++20/17/14/11 NOTE: this is a web “mirror” of Anthony Calandra’s modern-cpp-features shared under MIT License (see at bottom). The only...

modern cpp features - overview

shrik3 ·

本研究提出了一种优化的Sdcpp推理框架，解决了传统稳定扩散中的高延迟和内存问题，通过Winograd算法加速2D卷积，推理速度提升可达4.79倍。

开放源代码加速Stable-Diffusion.cpp

BriefGPT - AI 论文速递 ·

[!Error]+编写 C++ 代码时偶尔会遇到两个类需要相互引用的情况，如果在 h 文件中相互包含会导致 “has not been declared”

NVidia GPU在Windows用户中提供了共享GPU内存功能，允许系统内存用作虚拟VRAM。这可以在GPU的专用视频内存不足时提供帮助，但会对性能产生影响。作者测试了将GPU内存溢出到RAM对LLM训练速度的影响，并发现尽可能填充PC的RAM并使用共享GPU内存没有太大意义。作者还测试了不同的卸载设置，并发现使用50%的GPU和50%的CPU几乎完全填满了VRAM而没有溢出。结果显示，使用50/50的GPU/CPU具有最高的每秒标记数和最快的第一个标记时间。使用100%的GPU卸载会导致更多的系统内存使用。作者得出结论，使用共享VRAM没有太大意义。

Llama.cpp 和 GGUF 中的多模态嵌入

使用 llama.cpp 构建 AI 代理

Building a RAG Pipeline with llama.cpp in Python

无法加载共享库 'llama.dll': 找不到 (llama-cpp-python)

用 Ollama？其實你在跑 llama.cpp！學會直接使用它，發揮更強性能！

Jan v0.5.15：对llama.cpp设置的更多控制、先进的硬件控制及更多功能

SGLang与Llama.cpp的快速速度测试

Bitnet.cpp: Efficient Edge Inference for Ternary Large Language Models

使用OpenWebUI和Llama.cpp实现DeepSeek-R1工具调用以构建本地AI工作流程

modern cpp features - C++11

modern cpp features - C++14

modern cpp features - C++17

modern cpp features - C++20

modern cpp features - overview

开放源代码加速Stable-Diffusion.cpp

CPP_头文件互相包含

现代cpp多线程与并发初探

llama.cpp：CPU与GPU、共享VRAM与推理速度

【Rust日报】2024-07-25 mistral.rs 比 llama.cpp 在大部的CUDA GPU上都快了

BTMC:重返Modern Cpp