Claude AI Data Breach: Code Interpreter Exploit

Awards
Blogs
BrandPosts
Events
Podcasts
Videos
Enterprise Buyer’s Guides

Security researcher demonstrates how attackers can hijack Anthropic’s file upload API to exfiltrate sensitive information, even with network restrictions enabled.

A newly disclosed vulnerability in Anthropic’s Claude AI assistant has revealed how attackers can weaponize the platform’s code interpreter feature to silently exfiltrate enterprise data, bypassing even the default security settings designed to prevent such attacks.

Security researcher Johann Rehberger demonstrated that Claude’s code interpreter can be manipulated through indirect prompt injection to steal sensitive information, including chat histories, uploaded documents, and data accessed through integrated services. The attack leveraged Claude’s own API infrastructure to send stolen data directly to attacker-controlled accounts.

The exploit took advantage of a critical oversight in Claude’s network access controls. While the platform’s default “Package managers only” setting restricted outbound connections to approved domains like npm and PyPI, it also allowed access to api.anthropic.com, the very endpoint attackers can abuse for data theft.

How the attack works

The attack chain orchestrated by the researcher relied on indirect prompt injection, where malicious instructions are hidden within documents, websites, or other content that users ask Claude to analyze. Once triggered, the exploit executes a multi-stage process:

First, Claude retrieves sensitive data — such as recent conversation history using the platform’s newly introduced memory feature — and writes it to a file in the code interpreter sandbox. The malicious payload then instructs Claude to execute Python code that uploads the file to Anthropic’s Files API, but with a crucial twist: the upload uses the attacker’s API key rather than the victim’s.

“This code issues a request to upload the file from the sandbox. However, this is done with a twist,” Rehberger wrote in his blog post. “The upload will not happen to the user’s Anthropic account, but to the attackers, because it’s using the attacker’s ANTHROPIC_API_KEY.”

The technique allows exfiltration of up to 30MB per file, according to Anthropic’s API documentation, with no limit on the number of files that can be uploaded.

Bypassing AI safety controls

Rehberger’s report stated that developing a reliable exploit proved challenging due to Claude’s built-in safety mechanisms. The AI initially refused requests containing plaintext API keys, recognizing them as suspicious. However, Rehberger added that mixing malicious code with benign instructions — such as simple print statements — was sufficient to bypass these safeguards.

“I tried tricks like XOR and base64 encoding. None worked reliably,” Rehberger explained. “However, I found a way around it… I just mixed in a lot of benign code, like print (‘Hello, world’), and that convinced Claude that not too many malicious things are happening.”

Rehberger disclosed the vulnerability to Anthropic through HackerOne on October 25, 2025. The company closed the report within an hour, classifying it as out of scope and describing it as a model safety issue rather than a security vulnerability.

Rehberger disputed this categorization. “I do not believe this is just a safety issue, but a security vulnerability with the default network egress configuration that can lead to exfiltration of your private information,” he wrote. “Safety protects you from accidents. Security protects you from adversaries.”

Anthropic did not immediately respond to a request for comment.

Attack vectors and real-world risk

The vulnerability can be exploited through multiple entry points, the blog post added. “Malicious actors could embed prompt injection payloads in documents shared for analysis, websites users ask Claude to summarize, or data accessed through Model Context Protocol (MCP) servers and Google Drive integrations,” the blog added.

Organizations using Claude for sensitive tasks — such as analyzing confidential documents, processing customer data, or accessing internal knowledge bases — face particular risk. The attack leaves minimal traces, as the exfiltration occurs through legitimate API calls that blend with normal Claude operations.

For enterprises, mitigation options remain limited. Users can disable network access entirely or manually configure allow-lists for specific domains, though this significantly reduces Claude’s functionality. Anthropic recommends monitoring Claude’s actions and manually stopping execution if suspicious behavior is detected — an approach Rehberger characterizes as “living dangerously.”

The company’s security documentation also acknowledges the risk: “This means Claude can be tricked into sending information from its context (for example, prompts, projects, data via MCP, Google integrations) to malicious third parties,” Rehberger noted.

However, enterprises may incorrectly assume the default “Package managers only” configuration provides adequate protection. Rehberger’s research demonstrated that the assumption is false. Rehberger has not published the complete exploit code to protect users while the vulnerability remains unpatched. He noted that other domains on Anthropic’s approved list may present similar exploitation opportunities.

SUBSCRIBE TO OUR NEWSLETTER

From our editors straight to your inbox

Get started by entering your email address below.

Gyana Swain is a seasoned technology journalist with over 20 years’ experience covering the telecom and IT space. He is a consulting editor with VARINDIA and earlier in his career, he held editorial positions at CyberMedia, PTI, 9dot9 Media, and Dennis Publishing. A published author of two books, he combines industry insight with narrative depth. Outside of work, he’s a keen traveler and cricket enthusiast. He earned a B.S. degree from Utkal University.

More from this author

`,
cio: `

🚀 The new CIO.com hybrid search: 🔍 Explore CIO content smarter, faster and AI powered. ✨

`,
nww: `

🚀 The new NetworkWorld.com hybrid search: 🔍 Explore NetworkWorld content smarter, faster and AI powered. ✨

`,
cw: `

🚀 The new Computerworld.com hybrid search: 🔍 Explore Computerworld content smarter, faster and AI powered. ✨

`,
cso: `

🚀 The new CSOonline.com hybrid search: 🔍 Explore CSO content smarter, faster and AI powered. ✨

`
};

const sharedStyles = `

const publisher = foundry_get_publisher();
const htmlContent = contentSwitch[publisher];

if (!htmlContent || !document.body) return;

document.body.insertAdjacentHTML(“afterbegin”, htmlContent + sharedStyles);
const bar = document.querySelector(“.section-block–announcementbar”);

if (bar) {
requestAnimationFrame(() => {
bar.classList.add(“section-block–announcementbar–visible”);
});
}

const btn = document.querySelector(“.section-block–announcementbar .reset-button”);
const searchIcon = document.querySelector(‘.header__icon-button[data-menu-trigger=”search”] svg’);
const searchTrigger = document.querySelector(‘[data-menu-trigger=”search”]’);

if (searchIcon) {
searchIcon.innerHTML = ‘

‘;
}

if (btn && searchTrigger) {
btn.addEventListener(“click”, () => searchTrigger.click());
}

console.log(“[MISO SCRIPT] Conditions met, initializing Miso search announcements.”);
};

initMisoSearchAnnouncements();
});

document.addEventListener(‘consentManagerReady’, () => {
const hasConsentYouTube = consentManager.checkConsentByVendors([
‘YouTube’,
‘YT’
]);

if (hasConsentYouTube.some(vendor => vendor[‘Has Consent’] === false)) {
console.log(‘[YOUTUBE SCRIPT] Consent not given for YouTube.’);
} else {
console.log(‘[YOUTUBE SCRIPT] Consent given for YouTube. Loading script…’);
}
});

document.addEventListener(‘consentManagerReady’, () => {
const hasConsentGAM = consentManager.checkConsentByVendors([
‘Google Ad Manager’,
‘GAM’
]);

if (hasConsentGAM.some(vendor => vendor[‘Has Consent’] === false)) {
console.log(‘[GAM SCRIPT] Consent not given for GAM.’);
} else {
console.log(‘[GAM SCRIPT] Consent given for GAM. Loading script…’);
}
});

document.addEventListener(‘consentManagerReady’, () => {
const hasConsentGoogleFonts = consentManager.checkConsentByVendors([
‘Google Fonts’,
‘Google Web Fonts’
]);

if (hasConsentGoogleFonts.some(vendor => vendor[‘Has Consent’] === false)) {
console.log(‘[GOOGLE FONTS SCRIPT] Consent not given for Google Fonts.’);
} else {
console.log(‘[GOOGLE FONTS SCRIPT] Consent given for Google Fonts. Loading script…’);
}
});

document.addEventListener(‘consentManagerReady’, () => {
const hasConsentAdobeTypekit = consentManager.checkConsentByVendors([
‘Adobe Typekit’
]);

if (hasConsentAdobeTypekit.every(vendor => vendor[‘Has Consent’] === true)) {
if (foundry_is_edition(‘kr’)) {
const link = document.createElement(‘link’);
link.rel = ‘stylesheet’;
link.href = ‘https://use.typekit.net/ihi5tse.css’;
document.head.appendChild(link);
}
}
});

document.addEventListener(‘consentManagerReady’, () => {
const vendors = [‘Subscribers’];
const hasConsentSubscribers = consentManager.checkConsentByVendors(vendors);

if (hasConsentSubscribers.some(vendor => vendor[‘Has Consent’] === false)) {
return;

} else {

if (foundry_is_language(‘en’)) {
console.log(‘Language is English’);
// subscribers english ..
}

if (foundry_is_edition(‘kr’)) {
console.log(‘Edition is Korean’);
// subscribers in korean ..
}

if (foundry_is_edition(‘ja’)) {
console.log(‘Edition is Japanese’);
// subscribers in japanese ..
}

}
});

Claude AI vulnerability exposes enterprise data through code interpreter exploit

More

Security researcher demonstrates how attackers can hijack Anthropic’s file upload API to exfiltrate sensitive information, even with network restrictions enabled.

How the attack works

Bypassing AI safety controls

Attack vectors and real-world risk

From our editors straight to your inbox

More from this author

Chromium flaw crashes Chrome, Edge, Atlas: Researcher publishes exploit after Google’s silence

Critical ASP.NET core vulnerability earns Microsoft’s highest-ever severity score

Oracle issues second emergency patch for E-Business Suite in two weeks

LockBit, DragonForce, and Qilin form a ‘cartel’ to dictate ransomware market conditions

Google DeepMind launches an AI agent to fix code vulnerabilities automatically

Chinese hackers breached critical infrastructure globally using enterprise network gear

Okta introduces Identity Security Fabric to secure AI agents

Microsoft and Cloudflare execute ‘rugpull’ on massive phishing empire