小红花·文摘

In the previous post, we asked AI to make recommendations to help clean up the data loaded directly from a CSV file. The initial data load for the Name failed because a VARCHAR(64) was estimated...

Dave Stokes: Loading The Titanic Dataset Into PostgreSQL With DBeaver Part 3

Planet PostgreSQL ·

本研究提出了UWSAM模型和UIIS10K数据集，旨在解决水下实例分割中的技术不足。通过知识蒸馏和自动生成水下提示，显著提高了分割的准确性和效率，推动了水下视觉任务的发展。

UWSAM: Underwater Instance Segmentation Based on 'Segment Anything Model' and Its Large-scale Benchmark Dataset

BriefGPT - AI 论文速递 ·

本研究提出了一种自动生成基于上下文的问答对的方法，旨在提升大型语言模型在复杂推理和实时知识整合方面的能力。实验结果显示，该方法在逻辑一致性和事实准确性上优于传统的人类标注问答对。

Automatic Dataset Generation for Knowledge-Intensive Question Answering Tasks

BriefGPT - AI 论文速递 ·

本文介绍了AppleGrowthVision数据集，旨在解决苹果园监测中的数据集限制问题。该数据集包含高分辨率立体图像，支持准确的表型分析和3D重建。研究表明，使用YOLOv8和Faster R-CNN进行水果检测时，性能显著提升，为精准农业提供了基础。

Apple Growth Vision: A Large-Scale Stereo Dataset for Phenological Analysis, Fruit Detection, and 3D Reconstruction in Apple Orchards

BriefGPT - AI 论文速递 ·

本研究提出了MatPredict数据集，旨在从相机图像中识别室内物体的材料属性，推动消费机器人在室内物体感知方面的进步。

MatPredict: A Dataset and Benchmark for Learning Material Properties of Diverse Indoor Objects

BriefGPT - AI 论文速递 ·

本研究提出了FedRS数据集，填补了遥感领域真实联邦数据集的空白。该数据集通过135个客户端反映真实场景，实验结果表明联邦学习显著提升了模型性能，为大规模研究提供了标准化测试平台。

FedRS-Bench: A Realistic Federated Learning Dataset and Benchmark for Remote Sensing

BriefGPT - AI 论文速递 ·

本研究提出Re^2数据集，旨在解决同行评审数据集的多样性不足和质量低下的问题。该数据集包含大量初始提交、评审评论和反驳内容，支持静态评审和动态交互，帮助作者完善手稿，减轻审核压力。

Re^2: A Consistency-Ensured Dataset for Comprehensive Peer Review and Multi-Turn Rebuttal Discussions

BriefGPT - AI 论文速递 ·

本研究建立了选举期间误导性叙事的分类体系，并构建了2019年和2024年英国大选的数据集。研究表明，利用大型语言模型（如GPT-4o）检测这些叙事具有重要潜力。

UK Election Narratives: A Dataset of Misleading Narratives Surrounding Recent UK Elections

BriefGPT - AI 论文速递 ·

本研究探讨了文本到图像生成模型在文化适应性方面的不足，特别是对俄罗斯文化的理解。提出了一种基于文化代码的数据集处理方法，实验证明该方法能有效提高模型对俄罗斯文化的认知，改善生成质量。

CRAFT: A Cultural Russian-Oriented Dataset Adaptation for Focused Text-to-Image Generation

BriefGPT - AI 论文速递 ·

本研究提出了一种新的版权规避攻击方法CEAT2I，针对个性化文本生成图像扩散模型中的数据集版权问题。研究揭示了传统版权验证技术的脆弱性，并通过实验表明CEAT2I能有效规避这些验证，同时保持模型性能，具有重要的实用价值。

Towards Dataset Copyright Evasion Attack against Personalized Text-to-Image Diffusion Models

BriefGPT - AI 论文速递 ·

本研究提出了KoACD数据集，包含108,717个实例，专注于青少年认知扭曲问题。通过多重大语言模型优化分类，生成合成数据，以验证其在检测研究中的有效性。

KoACD: The First Analysis of the Korean Adolescent Cognitive Distortion Dataset

BriefGPT - AI 论文速递 ·

本研究提出了一种积极自适应AI方法，解决了医疗场景中非平稳环境下的数据集转变问题。该方法通过建模AI参数的时间轨迹，显著提升了性能，为动态环境下的自适应AI研究奠定了基础。

Proactive Adaptive Artificial Intelligence for Dataset Shifts in Dynamic Environments

BriefGPT - AI 论文速递 ·

本研究提出了“负责任人工智能实验室”（RAIL）框架，评估大型语言模型的伦理标准，展示八个可测量维度，旨在提升其在现实世界中的伦理表现。

Implementing Responsible AI Assessment in Real-World Applications: Utilizing Anthropic's Value Dataset

BriefGPT - AI 论文速递 ·

本研究提出了ECOSoundSet数据集，包含10,653个录音，涵盖200种直翅目和24种蝉类，旨在提升北欧、中欧和温带西欧昆虫声音的自动识别能力，为深度学习算法提供支持。

ECOSoundSet: A Finely Annotated Dataset for the Automated Acoustic Identification of Orthoptera and Cicadidae in Northern, Central, and Temperate Western Europe

BriefGPT - AI 论文速递 ·

本研究提出了增量因果效应与代理知识蒸馏（ICE-PKD）框架，旨在解决在线环境中因时间数据集转移带来的复杂性。该框架通过多处理提升网络和增量训练策略，有效应对用户行为和领域分布变化，并已在中国网约车平台华小猪中成功应用。

Dave Stokes: Loading The Titanic Dataset Into PostgreSQL With DBeaver Part 3

UWSAM: Underwater Instance Segmentation Based on 'Segment Anything Model' and Its Large-scale Benchmark Dataset

Automatic Dataset Generation for Knowledge-Intensive Question Answering Tasks

Apple Growth Vision: A Large-Scale Stereo Dataset for Phenological Analysis, Fruit Detection, and 3D Reconstruction in Apple Orchards

MatPredict: A Dataset and Benchmark for Learning Material Properties of Diverse Indoor Objects

FedRS-Bench: A Realistic Federated Learning Dataset and Benchmark for Remote Sensing

Re^2: A Consistency-Ensured Dataset for Comprehensive Peer Review and Multi-Turn Rebuttal Discussions

UK Election Narratives: A Dataset of Misleading Narratives Surrounding Recent UK Elections

CRAFT: A Cultural Russian-Oriented Dataset Adaptation for Focused Text-to-Image Generation

Towards Dataset Copyright Evasion Attack against Personalized Text-to-Image Diffusion Models

KoACD: The First Analysis of the Korean Adolescent Cognitive Distortion Dataset

Proactive Adaptive Artificial Intelligence for Dataset Shifts in Dynamic Environments

Implementing Responsible AI Assessment in Real-World Applications: Utilizing Anthropic's Value Dataset

ECOSoundSet: A Finely Annotated Dataset for the Automated Acoustic Identification of Orthoptera and Cicadidae in Northern, Central, and Temperate Western Europe

Estimation of Continual Causal Effects for Dataset Shifting Streams

TrueFake: A Real-World Case Dataset of the Latest Generation of Fake Images on Social Networks

WILD: A Novel Real-World Image Linking Dataset for Synthetic Image Attribution

ClimaEmpact: A Domain-Aligned Small Language Model and Dataset for Extreme Weather Analysis

AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with the OpenMathReasoning Dataset

PixelWeb: The First Web GUI Dataset with Pixel-Wise Labels