LLM微调经验

Personal experience care the changing of loss/reward and test dataste performance, ensure they change with same trend, otherwise, reward hacking / invalid loss function appear adjust learning-rate...

在阅读unsloth博客的“手动自动求导”后，我尝试解析模型，发现了更多可优化的点。torchview是一个很好的工具。

llm torchview 优化手动自动求导模型解析