BriefGPT - AI 论文速递 ·

RedCode: A Benchmark for Evaluating the Execution and Generation of Risky Code by Code Assistants

💡 原文英文，约100词，阅读约需1分钟。

📝

内容提要

本研究提出了RedCode基准，用于评估代码助手在生成或执行风险代码时的安全性。基准包含4,050个测试案例和160个提示，结果显示代码助手对风险操作的拒绝率较高，但对技术性错误的拒绝率较低，潜在风险较大。

🎯

🏷️

Anthropic Details How It Contains Claude Across Web, Code, and Cowork
Anthropic detailed the containment architectures it uses for Claude across it...
让 AI 快速「读懂」你的代码仓：Joy-Code-Graph 云端图谱服务的三次进化
代码知识图谱不是要取代 AI 的智能，而是要补齐它对代码全局关系的认知盲区。当 AI 能一眼看清「谁调用了谁、改动会波及哪里」，它写出的代码才真正靠谱；当...
Single-pass AI code isn’t dead, but “high-reasoning” is the next frontier
Ask an AI model what comes next after “bacon-double”, and the return is fairl...
OpenAI built support agents for its own customer service line, now it hopes big enterprises will trust them too
The general consensus emerging across the AI and industrial spheres is that t...
Building a serverless AI assistant at Pelago: concept to care in two weeks
Healthcare organizations face a critical scaling challenge – how to maintain ...
Visual Studio Code 1.130（Insiders）
Visual Studio Code 1.130 Insiders版本发布，新增功能更新。用户可通过提交日志和已关闭问题列表跟踪进展，鼓励大家尽快尝试新特性。