小红花·文摘 - 小红花技术领袖俱乐部

本研究探讨大型语言模型（LLMs）在检测冒犯性语言时的人类注释分歧问题。研究发现，LLMs对注释分歧样本的信心与人类一致性相关，这些分歧影响模型决策，为改进冒犯性语言检测提供了指导。

Unveiling the Capabilities of Large Language Models in Detecting Offensive Language and Annotation Disagreement

BriefGPT - AI 论文速递 ·

本研究提出了一种新颖的内容审核框架，利用多任务学习和符合性预测技术，将注释分歧视为重要信号，从而提升模型性能和审核效率。

A Collaborative Content Moderation Framework for Toxicity Detection Based on Conformalized Estimates of Annotation Disagreement

BriefGPT - AI 论文速递 ·