By Saadiq Rodgers-King in AI — 04 Apr 2025

Current AI Data Privacy Landscape

A comparison of data privacy practices across major AI providers—OpenAI, Anthropic, Google, and DeepSeek—covering training data usage, retention, HIPAA compliance, and user controls.

Data privacy concerns consistently come up in my conversations with companies weighing AI adoption. I've compiled this review of OpenAI, Anthropic, Google, and DeepSeek to clarify the current landscape and facilitate these discussions. The bottom line: regardless of how stringent your requirements, viable options exist.

This chart compares the privacy practices of leading large language model (LLM) providers—OpenAI, Anthropic, Google, and DeepSeek—across their consumer, paid, enterprise, and self-hosted offerings. Key concerns addressed include whether user data is used for model training by default, availability of opt-outs, data retention timelines, HIPAA eligibility, and jurisdictional control over user data. For privacy-conscious users or organizations, enterprise-tier services from OpenAI, Anthropic, and Google Cloud (Vertex AI) provide the most robust protections, including no training use by default, short retention windows, and regulatory compliance options like HIPAA BAAs. In contrast, DeepSeek’s hosted offering poses significant privacy risks, including lack of opt-out, indefinite data retention, and data storage in China. For the highest level of privacy, self-hosting open-source models remains the safest route, assuming the user has infrastructure to support it.

Privacy Practices Across LLM Providers (Mar 2025)

Provider	Tier	Data Used for Training (Default)	Opt-Out Available	Retention Duration	HIPAA Eligible	Notes
OpenAI	ChatGPT Free/Plus	Yes	Yes	Indefinite unless opted-out; 30 days in Temp Chat	No	Opt out in settings; Temp Chat deletes after 30 days
OpenAI	API / Enterprise / Azure	No	Not needed	30 days (shorter if requested)	Yes	Azure-hosted data never sent to OpenAI; BAAs available
Anthropic	Claude.ai Free / Pro	No	Not needed	30 days	No	Data flagged by T&S may be kept up to 2 yrs; feedback up to 10 yrs
Anthropic	API / Enterprise	No	Not needed	30 days (zero-retention optional)	Yes	Zero retention config for enterprise with BAA
Google	Bard (Consumer Gemini)	Yes	Yes	Indefinitely with history on; ~72 hrs with history off	No	Opt out via "Gemini Apps Activity" toggle
Google	Vertex AI / Cloud API	No	Not needed	Short/transient; enterprise-defined	Yes	Vertex AI covered under HIPAA BAA; strict enterprise data controls
DeepSeek	Chat Service (Cloud)	Yes	No	Unspecified; stored in China	No	All data used for training; high surveillance/privacy risk
DeepSeek	Self-hosted OSS Model	No (you control)	Not applicable	You control	You control	Only safe privacy option; requires internal infrastructure

Note: HIPAA Eligible refers to whether the provider offers a Business Associate Agreement (BAA)—a contract required under U.S. law for handling protected health information (PHI). A BAA commits the provider to HIPAA-compliant data safeguards and breach protocols.

PHI: Protected Health Information—any individually identifiable health data (e.g., names, dates, diagnoses, test results) governed by HIPAA in healthcare contexts.

Key Recommendations:

Use enterprise/API tiers for sensitive data.
For consumer use, disable history/training where available.
Avoid DeepSeek cloud service for any sensitive or confidential content.
For PHI, only use services offering BAAs (e.g., OpenAI API, Anthropic API, Vertex AI).
Consider open-source, self-hosted models for highest data control.
Educate users on privacy settings and policies before allowing tool use.
Monitor provider policy changes regularly.

Full report:

Data Privacy Report | saadiq.xyz

Data Privacy Report | saadiq.xyz.pdf

823 KB

Subscribe to Progress Over Perfection