🤿 PROMPT FACTORY DEEP DIVE: Do AI Coding Agents train on your data?

How to check if your model provider is using your code to train its models when you use a coding agent.

Jun 30, 2025

The Prompt Factory occasionally does a deep dive into specific topics, in addition to our regular weekly updates. Here we dive into the privacy considerations around coding agents.

Coding agents are a very new breed of AI for developers—tools like Claude Code, Gemini CLI and Codex CLI promise big productivity gains by autonomously writing code whilst a developer does something else. It sounds science fiction, but it works—and the productivity benefits can be large, especially for rapidly creating working prototypes to explore new ideas.

However, it's important to be clear about the privacy implications of using these tools. A big question is: “Do they store my code and use it to train their models?”

The answer varies between suppliers and the specific way that you pay (or don’t pay) for the service. Given the privacy and IP implications, it’s important to make sure you know the answer for how you might be using these services. This document helps you validate your situation.

Claude Code

Anthropic’s terms for Claude Code are here and they start with a simple statement:

“…Anthropic does not train generative models using code or prompts that are sent to Claude Code.”

This blanket statement makes Claude’s position very clear.

You can connect Claude Code to an Anthropic API key, or to any of the Claude plans, including the $17 Pro and both $100 and $200 variants of the Max plan. My experience is that Pro works for moderate use, but that heavy users will probably want a Max plan. But either way, there’s a cap on your monthly costs, so Claude Code is easy to budget for. API keys provide a handy way to try out Claude Code if you’re not a current Claude subscriber, but for serious users they will likely get expensive quickly.

OpenAI Codex CLI

OpenAI’s terms are here.

You can either connect Codex CLI to an API key, or to your ChatGPT plan. Details of how to connect to a ChatGPT plan are here.

“By default, OpenAI does not use any inputs or outputs from our products for business users, including ChatGPT Team, ChatGPT Enterprise, and the API, to improve our models. However, API organization owners can choose to opt-in to share API data with OpenAI. This setting is not available to certain organizations, including Enterprise and customers with Zero Data Retention enabled.”

The way I read this is that if you connect to an API key, your data won’t be used for training. Similarly, if you connect to a ChatGPT Team or Enterprise account, your data is safe.

However, if you connect to a ChatGPT Plus account, it seems like your code may well be used to train models. This is unfortunate, because ChatGPT Plus is by far the most common plan and connecting to a plan is by far the most cost efficient way to use Codex CLI.

If you connect to an API key, then you’ll be paying on a PAYG basis without any monthly caps on your costs. It might make more sense to upgrade your ChatGPT plan if you want higher usage limits or to avoid donating your code to OpenAI for model training purposes.

Google Gemini CLI

Gemini CLI has a number of different ways to authenticate and which you choose influences the privacy situation.

Gemini is unique in that it’s offered with a generous free tier that’ll likely be attractive to many developers:

“To use Gemini CLI free-of-charge, simply login with a personal Google account to get a free Gemini Code Assist license. That free license gets you access to Gemini 2.5 Pro and its massive 1 million token context window. To ensure you rarely, if ever, hit a limit during this preview, we offer the industry’s largest allowance: 60 model requests per minute and 1,000 requests per day at no charge.”

As Google comments, most people are unlikely to exceed those usage limits, so a lot of people will stop there and celebrate that they’ve got something for free. Well, they have... but when something is free, there’s normally a different way that you end up paying. In this case, it’s that Google will store your code and use it to train its models.

Here’s the relevant comment from Google that confirms this:

“When you use your personal Google account, the Gemini Code Assist Privacy Notice for Individuals applies. Under this notice, your prompts, answers, and related code are collected and may be used to improve Google's products, which includes model training... ”

If you’re a professional developer, you can use a paid Google AI Studio or Vertex AI key for usage-based billing or get a Gemini Code Assist Standard or Enterprise license. When you do that, no data is used for training. But you must make sure you are using a professional account (Workspace, paid AI Studio key, or Licensed Code Assist).

If you authenticate with a Personal Google account, Gemini Code Assist for individuals, or unpaid services (including the very generous Gemini CLI free tier), your data may be used to train Google’s models.

Summary

Claude Code works with Pro and Max plans or with API keys. Anthropic never uses your code for training purposes, regardless of authentication mechanism.
OpenAI Codex CLI will use your code for model training purposes if you connect it to a ChatGPT Plus plan, but not for Pro or Team plans.
Gemini CLI has a very generous free tier that might tempt many, but it’s only free if you don’t mind Google training its models on your code. You can opt out by using a paid API key on a professional account (not a personal Google account).

For all three of these coding agents it’s possible to burn a large number of API tokens if you’re not using a subscription plan, and therefore costs can mount rapidly. If you’re a heavy user, consider a solution that caps your cost exposure.

Whatever you do, make sure you understand the privacy implications of your choice. Is your code going to be used to train future models and are you OK with that? If you’re just doing personal work or prototyping, it might not matter. But those working on banking systems may take a different view. If it seems like you’ve got an amazing deal (e.g. Gemini CLI free tier), there’s probably a good reason for that—there is no free lunch in this world!

🚨 Shameless plug alert

Barnacle Labs builds AI solutions for ambitious organisations tackling complex challenges. We're the team behind the National Cancer Institute's NanCI AI-powered app—we help you identify the right opportunities and ship solutions that deliver results.

Reply to this email with your biggest AI challenge if you'd like to talk!

Anya Trofimova

Jul 19, 2025

A fascinating and timely deep-dive—thank you! The nuance around how different providers handle training data is often overlooked in the rush to adopt the latest tool. One thing I'd be curious to hear more about: do you see any movement toward a unified standard or 'privacy grade' across the industry?

Jul 20, 2025

Thank you and a good question! Nothing so far, to my knowledge. I think we're still in the frantic competition phase and it'll take some time for things to settle down. Despite the talk about regulation, I have little hope that this will be effective - governments seem to struggle to get their heads around what might be needed or effective.

The Prompt Factory

Discussion about this post

Ready for more?