BriefGPT - AI 论文速递 ·

Design Choices for Long Visual Language Models: GIRAFFE

💡 原文英文，约100词，阅读约需1分钟。

📝

内容提要

本研究提出Giraffe模型，解决了视觉语言模型在处理多图像和高分辨率视频时的上下文长度不足问题，扩展至128K的上下文长度，性能显著提升。

🎯

关键要点

本研究提出Giraffe模型，解决了视觉语言模型在处理多图像和高分辨率视频时的上下文长度不足问题。
Giraffe模型扩展至128K的上下文长度，性能显著提升。
研究中建立了ETVLM数据配方，提出了改进的M-RoPE++方法以及混合分辨率训练。
Giraffe模型在长上下文性能测试中表现卓越，达到了开源视觉语言模型的最佳水平。
Giraffe模型与商业模型GPT-4V具有竞争力。

🏷️

标签

Giraffe模型 models 上下文长度多图像视觉语言模型高分辨率视频

➡️

继续阅读

Peak Design’s modular Field Bracket has a finder tag built-in
I am a very clumsy man. So clumsy, that I have AirTags hanging off practicall...
What’s new: Air gets more agents, local models, and Java/Kotlin code intelligence
The new release of JetBrains Air brings support for GitHub Copilot, OpenCode,...
Google ships 3 new Gemini models. Just not the one everyone’s waiting for.
Google on Tuesday launched three new Gemini models: Gemini 3.6 Flash, a cheap...
Google launches a cheaper alternative to large AI security models like Mythos
Google is launching Gemini 3.6 Flash alongside a new security model dedicated...
Inside Roblox’s Bet on World Models
We sat down with Anupam Singh, senior vice president of engineering at Roblox...
GKE Security Blueprint Joins Growing List of Cloud AI Frameworks
Google Cloud has published a new blueprint setting out how organisations shou...