谷歌发布PaliGemma 2视觉语言模型系列
原文英文,约600词,阅读约需2分钟。发表于: 。Google DeepMind released PaliGemma 2, a family of vision-language models (VLM). PaliGemma 2 is available in three different sizes and three input image resolutions and achieves state-of-the-art...
谷歌DeepMind推出PaliGemma 2视觉语言模型,提供三种尺寸和分辨率,性能卓越。该模型结合了SigLIP-So400m图像编码器和Gemma 2 LLM,经过多项基准测试,超越了现有前沿模型。PaliGemma 2可生成详细图像描述,支持多种任务,且在CPU上运行时质量无显著差异。