BriefGPT - AI 论文速递 ·

Paramanu: 一系列新型高效的印度生成基础语言模型

💡 原文中文，约500字，阅读约需2分钟。

📝

内容提要

该论文介绍了为11种印度语言引入神经信息检索资源的工作，包括使用机器翻译创建的数据集和不同的神经信息检索模型集合。实验证明，这些资源在多种印度语言上的性能有显著改进。

🎯

关键要点

该论文介绍了为11种印度语言引入神经信息检索资源的工作。
涉及的语言包括阿萨姆语、孟加拉语、古吉拉特语、印地语、卡纳达语、马拉雅拉姆语、马拉地语、奥利亚语、旁遮普语、泰米尔语和特鲁古语。
创建了使用机器翻译的INDIC-MARCO数据集和Indic-ColBERT神经信息检索模型集合。
IndicIRSuite是首次为大量印度语言构建大规模神经信息检索资源的尝试。
实验证明，Indic-ColBERT在除奥利亚语外的所有11种语言上，MRR@10得分平均提高了47.47%。
在孟加拉语和印地语基线上的NDCG@10得分平均提高了12.26%。
在孟加拉语基线上的MRR@100得分提高了20%。
IndicIRSuite可以在指定的URL上获取。

🏷️

标签

印度印度语言性能改进数据集机器翻译神经信息检索语言模型

➡️

继续阅读

Announcing the Public Preview of Discover and Domains, powered by Unity Catalog
Today, we're announcing the Public Preview of Domains and the Discover pa...
Peak Design’s modular Field Bracket has a finder tag built-in
I am a very clumsy man. So clumsy, that I have AirTags hanging off practicall...
Nearly every Kindle is steeply discounted at Best Buy
If you’ve been thinking about picking up a Kindle before school starts, or fo...
Single-pass AI code isn’t dead, but “high-reasoning” is the next frontier
Ask an AI model what comes next after “bacon-double”, and the return is fairl...
Apple’s rumored ‘Upgrade’ program brings lease-to-own pricing for iPhones, Macs, and iPads
As component and RAM shortages drive prices higher, Apple is reportedly launc...
Microsoft is building an AI stack it doesn’t fully own — on purpose
Microsoft and Mistral are deepening their partnership with a multibillion-dol...