Current AI Data Privacy Landscape

A comparison of data privacy practices across major AI providers—OpenAI, Anthropic, Google, and DeepSeek—covering training data usage, retention, HIPAA compliance, and user controls.

Current AI Data Privacy Landscape

Data privacy concerns consistently come up in my conversations with companies weighing AI adoption. I've compiled this review of OpenAI, Anthropic, Google, and DeepSeek to clarify the current landscape and facilitate these discussions. The bottom line: regardless of how stringent your requirements, viable options exist.

This chart compares the privacy practices of leading large language model (LLM) providers—OpenAI, Anthropic, Google, and DeepSeek—across their consumer, paid, enterprise, and self-hosted offerings. Key concerns addressed include whether user data is used for model training by default, availability of opt-outs, data retention timelines, HIPAA eligibility, and jurisdictional control over user data. For privacy-conscious users or organizations, enterprise-tier services from OpenAI, Anthropic, and Google Cloud (Vertex AI) provide the most robust protections, including no training use by default, short retention windows, and regulatory compliance options like HIPAA BAAs. In contrast, DeepSeek’s hosted offering poses significant privacy risks, including lack of opt-out, indefinite data retention, and data storage in China. For the highest level of privacy, self-hosting open-source models remains the safest route, assuming the user has infrastructure to support it.

Privacy Practices Across LLM Providers (Mar 2025)

Provider Tier Data Used for Training (Default) Opt-Out Available Retention Duration HIPAA Eligible Notes
OpenAI ChatGPT Free/Plus Yes Yes Indefinite unless opted-out; 30 days in Temp Chat No Opt out in settings; Temp Chat deletes after 30 days
API / Enterprise / Azure No Not needed 30 days (shorter if requested) Yes Azure-hosted data never sent to OpenAI; BAAs available
Anthropic Claude.ai Free / Pro No Not needed 30 days No Data flagged by T&S may be kept up to 2 yrs; feedback up to 10 yrs
API / Enterprise No Not needed 30 days (zero-retention optional) Yes Zero retention config for enterprise with BAA
Google Bard (Consumer Gemini) Yes Yes Indefinitely with history on; ~72 hrs with history off No Opt out via "Gemini Apps Activity" toggle
Vertex AI / Cloud API No Not needed Short/transient; enterprise-defined Yes Vertex AI covered under HIPAA BAA; strict enterprise data controls
DeepSeek Chat Service (Cloud) Yes No Unspecified; stored in China No All data used for training; high surveillance/privacy risk
Self-hosted OSS Model No (you control) Not applicable You control You control Only safe privacy option; requires internal infrastructure

Note: HIPAA Eligible refers to whether the provider offers a Business Associate Agreement (BAA)—a contract required under U.S. law for handling protected health information (PHI). A BAA commits the provider to HIPAA-compliant data safeguards and breach protocols.

PHI: Protected Health Information—any individually identifiable health data (e.g., names, dates, diagnoses, test results) governed by HIPAA in healthcare contexts.

Key Recommendations:

  1. Use enterprise/API tiers for sensitive data.
  2. For consumer use, disable history/training where available.
  3. Avoid DeepSeek cloud service for any sensitive or confidential content.
  4. For PHI, only use services offering BAAs (e.g., OpenAI API, Anthropic API, Vertex AI).
  5. Consider open-source, self-hosted models for highest data control.
  6. Educate users on privacy settings and policies before allowing tool use.
  7. Monitor provider policy changes regularly.

Full report:

Subscribe to Progress Over Perfection

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe