Which Generative AI Is Most Privacy-Respecting?

Does Your Chatbot Spy on You?

ObscureIQ | 2025 Edition

AI Privacy Concept Artificial intelligence is now part of daily life. Drafting documents. Summarizing emails. Writing code. Shaping decisions. Across every industry. But the more we use these systems, the more people ask the same questions:
  • Are Large Language Models (LLMs) safe to use?
  • Does Generative AI use my questions and answers to train their models?
  • Can ChatGPT see my private data? Does it know who I am?
  • Do AI companies keep a record of my conversations?
  • How do chatbots know about things I asked them in the past?
  • Which AI tool protects privacy best?
These aren’t just technical questions. They’re trust questions. Each prompt you enter reveals fragments of identity, context, or strategy. If that data is stored, reused, or linked to an account, it can become part of the model’s long memory. And part of someone else’s dataset. At ObscureIQ, we assess AI safety through a privacy lens. Our analysis focuses on two core dimensions:
  • Where your data travels.
  • Whether the system is built to forget.
This report ranks today’s leading generative AI platforms (from local open-source models to mainstream consumer bots) by how well they protect user data, limit retention, and minimize third-party exposure. It’s not about who answers best. It’s about who remembers least.

🔶 Self-Hosted or On-Device Models

Self-Hosted AI Models Examples: GPT-4All, Mistral, LLaMA Privacy Posture: 🟩 Strongest (5/5) Running a model locally keeps everything under your control. No third-party logging. No training reuse. You decide what persists and what disappears. The trade-off is hardware and setup complexity, but the privacy return is total containment. *Note: User becomes responsible for endpoint security (OS patches, disk encryption).

🔶 Proton Lumo

Privacy Posture: 🟩 Strong (4.7/5) Operates under Swiss FDPA + GDPR jurisdiction and employs zero-access encryption, meaning Proton cannot read conversation content. It does not train on user prompts. Minimal metadata only: timestamp, device type, anonymized session ID. Operates under Swiss privacy law (FDPA + GDPR alignment). Independent audits remain limited, but Proton’s record in encrypted communications supports a high trust baseline. Zero-access encryption. Proton cannot read conversation content.

🔶 Brave Leo AI

Privacy Posture: 🟩 Strong (4.7/5) Built into the Brave browser with privacy-first architecture. Session data retained temporarily for continuity but not linked to user IDs or accounts. No data is used for training models. Operates under UK/EU GDPR jurisdiction with full transparency and user control via simple toggles. Leo processes requests through privacy-preserving infrastructure, ensuring queries remain anonymous and disconnected from browsing identity. Not linked to user IDs or accounts. Not used for training.

🔶 Mistral Chat (Cloud)

Privacy Posture: 🟩 Strong (4.4/5) GDPR-bound service with clear opt-out and deletion policies. Temporary storage ≈ 30 days for context continuity on the consumer “le Chat” interface; Zero Data Retention (ZDR) applies only to API and enterprise tiers. Transparent documentation and non-training defaults make Mistral the most privacy-aligned cloud option short of self-hosting.

🔶 Claude (Anthropic)

Privacy Posture: 🟨 Moderate (3.5/5) Enterprise / API data excluded from training. Consumer interface logs by default; retention can be disabled per chat. Even with logging off, Anthropic retains prompts ≈ 30 days for abuse monitoring (“Unbreakable Safety Loop”). Transparent but not private by default.

🔶 ChatGPT (OpenAI)

ChatGPT Privacy Privacy Posture: 🟨 Moderate (3.5/5) Users can disable training ( Settings → Data Controls → Improve the model for everyone = Off ) and chat history. Defaults still favor retention. Team / Enterprise / API tiers receive contractual zero-training guarantees, though API calls remain logged for security review. Privacy requires manual configuration and regular purging.

🔶 Perplexity AI

Privacy Posture: 🟨 Moderate (3.1/5) Perplexity functions as a hybrid search and chat engine. Logs by default; Private Mode disables local storage. Even in Private Mode, prompts transit third-party model providers (OpenAI / Anthropic), so data is only as private as the weakest partner. Enterprise contracts guarantee zero-training; consumer trust remains split across multiple jurisdictions.

🔶 Siri (Apple)

Privacy Posture: 🟨 Mixed (2.5/5) Audio recordings now require opt-in; text transcripts still processed on Apple servers. Retention periods undisclosed. Private Cloud Compute is beginning to process requests on-device but not universally deployed.

🔶 Gemini (Google)

Privacy Posture: 🟥 Weaker (2 / 5) Prompts linked to Google accounts and retained unless deleted. Even with “Activity Control” off, data persists ≈ 72 hours for operations and up to 3 years if flagged for human review. Disabling training breaks features like Docs / Gmail extensions. Privacy requires sacrificing functionality.

🔶 Copilot (Microsoft)

Privacy Posture: 🟥 Weaker (2/5) Tier-dependent privacy. Enterprise / Government versions offer zero-training and optional “Customer Lockbox” for data access control. Consumer versions retain telemetry and prompt data for product improvement across Windows and Office.

🔶 xAI / Grok (via X / Twitter)

Privacy Posture: 🟥 Weakest (1.3/5) Prompts are tied to X accounts and stored within the ad and social identity framework. No training opt-out or clear retention policy. Deletion removes visibility but not guaranteed server erasure.

🔶 Opaque or Unverified Providers

Privacy Posture: 🟥 Weakest (1/5) Lack of public policies means assume full retention and reuse. Examples include unregulated résumé screeners, entertainment chatbots, and “AI companions” with no published governance.

How We Scored

Privacy posture was determined using six equally weighted criteria:
  • Prompt Retention: (25%) How long data persists after use.
  • Training Opt-Out: (20%) Whether user data is used to train models by default.
  • Jurisdiction: (15%) Which legal framework governs the data.
  • Transparency: (15%) Availability and clarity of privacy documentation.
  • User Control: (15%) Ability to disable retention, delete data, or operate anonymously.
  • Default Behavior: (10%) Whether privacy requires configuration or exists by design.
This framework aligns with current academic and regulatory work examining privacy risks in large language models. Studies from Miranda et al. (2024) and Liu et al. (2024) describe how LLMs can memorize user data, reproduce personal information, and leak context through model inversion and membership inference attacks. Our scoring emphasizes not just policy transparency but data handling reality. Whether the architecture or business model itself is built to forget. Legal jurisdiction is treated as one input among technical and behavioral safeguards.

♦️ Key Points to Understand

  • Local beats cloud. Containment equals control.
  • Subscription beats advertising. Revenue model defines privacy model.
  • Defaults matter. Opt-out ≠ protection.
  • Voice AI ≠ private AI. Server processing still applies.
  • Social integrations erase anonymity.
Recent policy analysis (King et al., 2025) found that most frontier AI providers retain user chats indefinitely by default and often bury training disclosures in secondary documents. Our findings align with that pattern: strong language in public-facing policies, weak guarantees in practice.

⚠️ Even opt-outs may not erase memory.

Independent tests show that fine-tuned LLMs can retain and regenerate sensitive data even after developers disable training on user inputs (Aditya et al., 2024). True privacy depends on design, not declarations.
AI Privacy Conclusion

Using ChatGPT, Perplexity, Claude & Brave Leo More Privately

If you rely on these mainstream tools, you can still limit exposure. Use the following baseline configuration:

ChatGPT (OpenAI)

  • Turn off training use: Settings → Data Controls → “Improve the model for everyone” → Off.
  • Disable chat history: Settings → “Chat History & Training” → Off (to prevent logging in default UI).
  • Use ChatGPT Team or Enterprise if available — these plans exclude all data from training by default.
  • Never feed credentials or PII into free or Plus accounts.

Perplexity AI

  • Always use Private Mode (tap lock icon in top-right of interface).
  • Stay signed out — anonymous sessions retain less data.
  • Block cross-site trackers in browser settings to prevent analytics linkage.
  • Clear query history manually if you log in for saved threads.

Claude (Anthropic)

  • Disable chat history: Settings → Privacy → Uncheck “Save conversations.”
  • Purge data regularly: Use the “Delete Conversations” option after each session.
  • Avoid sensitive identifiers: Never include real names, addresses, or client data in prompts.
  • Enterprise use: Prefer Claude via API or Anthropic Console for zero-training contracts.

Brave Leo AI

  • Leo is private by default — no account required, no persistent logging.
  • Session data is transient and not linked to your Brave profile or browsing history.
  • Toggle Leo on/off in Brave Settings → Leo to control when it’s active.
  • Use Leo for general queries; avoid inputting highly sensitive information as with any cloud AI.
Baseline rule: Treat all cloud AI chats as potentially recorded. Use generic or synthetic identifiers and avoid discussing sensitive operations.

🔷 ObscureIQ Top Picks

➤ Best Choice: Self-Hosted / On-Device models

You get complete control, zero external exposure.

➤ Best Cloud Option: Proton Lumo, Brave Leo AI, or Mistral Chat

You get transparent, GDPR-aligned LLMs with strong privacy defaults.

➤ Acceptable with Caution: Claude, ChatGPT, and Perplexity

You must configure. Safe only with manual hardening.

➤ Weaker Defaults: Siri, Gemini, Copilot

They are functional, not private. They retain user data.

➤ High-Risk / Avoid: xAI / Grok and opaque providers

These are identity-linked, non-transparent.

The key question you should ask about the AI systems you use: What does it remember about you?

The broader literature is clear: privacy loss in generative AI is structural. Without explicit limits on retention and training reuse, every conversation becomes potential training data.

Choose systems engineered to forget.

Generative AI Privacy Matrix (2025)
Platform Overall Prompt Retention Training Opt-Out Jurisdiction Transparency User Control Default Set.
Self-Hosted / On-Device(GPT-4All, Mistral, LLaMA) 🟩 5 / 5 5 None 5 Full 5 Local / User 5 Full 5 Full 5
Lumo(Proton) 🟩 4.7 / 5 4 Minimal metadata 5 Default off 5 Switzerland 5 Clear 5 Simple toggle 4
Brave Leo AI(Brave) 🟩 4.7 / 5 4 Minimal transient data 5 Default opt-out 5 UK / EU (GDPR) 5 Full 5 Full toggle 4
Mistral Chat(Cloud) 🟩 4.4 / 5 4 Temp ≤ 30 days 5 Default off 5 France (GDPR) 5 Transparent 4 Deletion policy 3
Claude(Anthropic) 🟨 3.5 / 5 3 30-day min. retention 4 Available 3 US / Global 4 Clear terms 3 Manual setting 4
ChatGPT(OpenAI) 🟨 3.5 / 5 3 30-day min. logs 4 Available 3 US / Global 4 Dependent on tier 3 Manual setting 4
Perplexity AI 🟨 3.1 / 5 3 Retained by default 3 Partial (model-dep) 3 US / Global 3 Moderate 4 Private Mode 2
Siri(Apple) 🟨 2.5 / 5 3 Transcripts retained 1 None 3 US / Global 3 Limited audit 2 No toggle 3
Gemini(Google) 🟥 2.0 / 5 2 72 h / 3 y retention 1 None 3 US / Global 3 Complex 2 Limited 1
Copilot(Microsoft) 🟥 2.0 / 5 2 Tier-based retention 1 None (consumer) 3 US / Global 3 Tier-dependent 2 Limited 1
Meta.ai(Llama / Facebook Int.) 🟥 1.9 / 5 2 Persistent, account-linked 1 No opt-out 3 US / EU hybrid 2 Complex, generalized 2 Manual deletion only 1
xAI / Grok(via X / Twitter) 🟥 1.3 / 5 1 Persistent, tied to user 1 None 2 US / Global 1 Minimal 1 None 2
DeepSeek(China-based LLM) 🟥 1.2 / 5 1 Unkn/assume persistent 1 None 1 China 2 Opaque 1 None 1
Opaque / Unverified Providers 🟥 1.0 / 5 1 Unknown 1 None 1 Unknown 1 None 1 None 1
← Scroll horizontally to view all columns →

Notes

Self-hosted: User responsible for endpoint security (OS patches, disk encryption).

Lumo: Zero-access encryption. Proton cannot read conversation content.

Leo: Session data retained temporarily for continuity. Not linked to user IDs or accounts. Not used for training.

Mistral: ZDR available for API / Enterprise only.

Claude: Opt-out possible but 30-day retention.

ChatGPT: ≈ 30-day security logs (API calls retained for abuse monitoring).

Perplexity: Training Opt-Out is partial. Private Mode stops logging; data still passes to model partners. Hybrid architecture routes through OpenAI / Anthropic APIs.

Siri: Transcripts retained (server); audio opt-in only ('Improve Siri & Dictation').

Gemini: Formal training opt-out not available via Activity Control. Disabling Activity Control limits functionality (Gmail, Docs extensions).

Copilot: Enterprise plans can enable Customer Lockbox to restrict data export.

Meta.ai: Prompts and chats linked to Meta accounts; no dedicated opt-out. Training governed by Meta's general Data Policy.

xAI/Grok: Prompts tied to X account; users may delete threads but cannot disable training.

DeepSeek: China-based; no published data-handling disclosures or user controls; governed by PRC AI regulations.

Opaque et al: Résumé screeners, AI companions, marketing chatbots without privacy policies.

Last verified October 2025 | Next audit Q1 2026 | Copyright 2025, ObscureIQ. | Distribute freely with attribution.

📚 Further Reading on LLM Privacy

Miranda et al. (2024)

Preserving Privacy in Large Language Models: A Survey on Current Threats and Solutions

This survey examines how LLMs memorize and inadvertently reveal sensitive information, and explores defenses like differential privacy and machine unlearning.

📄 arXiv · Read Paper →

Liu et al. (2024)

Generative AI Model Privacy: A Survey

A broad review covering generative-AI models (LLMs, diffusion, GANs) that catalogues privacy attacks (membership inference, model inversion) and defence mechanisms.

📄 Springer · Read Paper →

Zhao & Song (2024)

Privacy-Preserving Techniques in Generative AI and LLMs: Mechanisms, Applications, and Future Directions

A review of technical solutions (federated learning, homomorphic encryption, differential privacy) tailored to LLMs and generative AI.

📄 MDPI / arXiv · Read Paper →

King et al. (2025)

User Privacy and Large Language Models: An Analysis of Frontier Developers’ Privacy Policies

Qualitative policy analysis of major LLM developers: finds many use chat data for training by default, lack clarity, retain data indefinitely.

📄 arXiv · Read Paper →

European Data Protection Board (EDPB) (2025)

AI Privacy Risks & Mitigations – Large Language Models (LLMs)

Regulatory-oriented analysis of privacy risk management for LLMs, with practical use-cases and mitigation strategies.

📄 EDPB · Read Report →

OWASP (2025)

LLM and GenAI Data Security Best Practices

Industry-focused guidance on securing LLM data, practical controls, and risk mitigation for deployment of generative-AI systems.

📄 OWASP · Read Guide →

Aditya et al. (2024)

Evaluating Privacy Leakage and Memorization Attacks on Large Language Models (LLMs) in Generative AI Applications

Empirical study analyzing how different models leak personally identifiable information (PII) via memorization, and how model size/architecture affect vulnerability.

📄 PDF · Read Paper →

Duan et al. (2024)

Uncovering Latent Memories: Assessing Data Leakage and Memorization Patterns in Large Language Models

Examines “latent memorization” — hidden data remnants that resurface in LLM outputs. Shows memorization scales with data repetition and reveals methods to detect it.

📄 PDF · Read Paper →

Panda et al. (2025)

Privacy Auditing of Large Language Models

Introduces improved audit tools for detecting privacy leakage in LLMs, revealing higher real-world exposure rates than prior tests.

📄 PDF · Read Paper →

Share the Post:

Related Posts

Analysis

Biometric Identifiers Executives Can’t Ignore

October 24, 2025
Biometric Identifiers Executives Can’t Ignore Executives and high-net-worth individuals already live under constant exposure: financial leaks, digital footprints, and physical…
ai voice cloningbiometric authenticationbiometric risksbiometric surveillanceborder biometricsbrainwave monitoringcoercion and blackmailcoercive collectiondata permanencedna privacydna samplingemotional-aiexecutive privacyfacial recognitionfingerprint recognitionhigh value targetsidentity exploitationiris recognitionmicro expression analysisprivacy threatspsychological profilingretina scanningsurveillance technologytravel checkpointsvoice recognition
Analysis

The Executive Threat Matrix: How Data Wipes Shift the Risk

October 17, 2025
The Executive Threat Matrix: How Data Wipes Shift the Risk Executives face ten major categories of threat in 2025. Each…
data-suppressiondata-wipesdigital footprintfamily exploitationfootprint erasurehome invasionkidnapping riskstalking harassmenttargeted-violencethreat matrix
Commercial Surveillance

AR Glasses: A New Threat Surface for High-Value Targets

October 9, 2025
AR Glasses: The Surveillance Tool Disguised as Fashion Augmented reality (AR) glasses are not just eyewear. They’re surveillance tools disguised…
ar eyewearar glassesbiometric datacorporate espionagedata wholesalersexecutive securityinsider threatslocation leakagewearable surveillanceworkplace security