Does Your Chatbot Spy on You?
ObscureIQ | 2025 Edition
Artificial intelligence is now part of daily life. Drafting documents. Summarizing emails. Writing code. Shaping decisions. Across every industry.
But the more we use these systems, the more people ask the same questions:
- Are Large Language Models (LLMs) safe to use?
- Does Generative AI use my questions and answers to train their models?
- Can ChatGPT see my private data? Does it know who I am?
- Do AI companies keep a record of my conversations?
- How do chatbots know about things I asked them in the past?
- Which AI tool protects privacy best?
- Where your data travels.
- Whether the system is built to forget.
🔶 Self-Hosted or On-Device Models
Examples: GPT-4All, Mistral, LLaMA
Privacy Posture: 🟩 Strongest (5/5)
Running a model locally keeps everything under your control.
No third-party logging. No training reuse. You decide what persists and what disappears.
The trade-off is hardware and setup complexity, but the privacy return is total containment.
*Note: User becomes responsible for endpoint security (OS patches, disk encryption).
🔶 Proton Lumo
Privacy Posture: 🟩 Strong (4.7/5) Operates under Swiss FDPA + GDPR jurisdiction and employs zero-access encryption, meaning Proton cannot read conversation content. It does not train on user prompts. Minimal metadata only: timestamp, device type, anonymized session ID. Operates under Swiss privacy law (FDPA + GDPR alignment). Independent audits remain limited, but Proton’s record in encrypted communications supports a high trust baseline. Zero-access encryption. Proton cannot read conversation content.🔶 Brave Leo AI
Privacy Posture: 🟩 Strong (4.7/5) Built into the Brave browser with privacy-first architecture. Session data retained temporarily for continuity but not linked to user IDs or accounts. No data is used for training models. Operates under UK/EU GDPR jurisdiction with full transparency and user control via simple toggles. Leo processes requests through privacy-preserving infrastructure, ensuring queries remain anonymous and disconnected from browsing identity. Not linked to user IDs or accounts. Not used for training.🔶 Mistral Chat (Cloud)
Privacy Posture: 🟩 Strong (4.4/5) GDPR-bound service with clear opt-out and deletion policies. Temporary storage ≈ 30 days for context continuity on the consumer “le Chat” interface; Zero Data Retention (ZDR) applies only to API and enterprise tiers. Transparent documentation and non-training defaults make Mistral the most privacy-aligned cloud option short of self-hosting.🔶 Claude (Anthropic)
Privacy Posture: 🟨 Moderate (3.5/5) Enterprise / API data excluded from training. Consumer interface logs by default; retention can be disabled per chat. Even with logging off, Anthropic retains prompts ≈ 30 days for abuse monitoring (“Unbreakable Safety Loop”). Transparent but not private by default.🔶 ChatGPT (OpenAI)
Privacy Posture: 🟨 Moderate (3.5/5)
Users can disable training ( Settings → Data Controls → Improve the model for everyone = Off ) and chat history.
Defaults still favor retention.
Team / Enterprise / API tiers receive contractual zero-training guarantees, though API calls remain logged for security review.
Privacy requires manual configuration and regular purging.
🔶 Perplexity AI
Privacy Posture: 🟨 Moderate (3.1/5) Perplexity functions as a hybrid search and chat engine. Logs by default; Private Mode disables local storage. Even in Private Mode, prompts transit third-party model providers (OpenAI / Anthropic), so data is only as private as the weakest partner. Enterprise contracts guarantee zero-training; consumer trust remains split across multiple jurisdictions.🔶 Siri (Apple)
Privacy Posture: 🟨 Mixed (2.5/5) Audio recordings now require opt-in; text transcripts still processed on Apple servers. Retention periods undisclosed. Private Cloud Compute is beginning to process requests on-device but not universally deployed.🔶 Gemini (Google)
Privacy Posture: 🟥 Weaker (2 / 5) Prompts linked to Google accounts and retained unless deleted. Even with “Activity Control” off, data persists ≈ 72 hours for operations and up to 3 years if flagged for human review. Disabling training breaks features like Docs / Gmail extensions. Privacy requires sacrificing functionality.🔶 Copilot (Microsoft)
Privacy Posture: 🟥 Weaker (2/5) Tier-dependent privacy. Enterprise / Government versions offer zero-training and optional “Customer Lockbox” for data access control. Consumer versions retain telemetry and prompt data for product improvement across Windows and Office.🔶 xAI / Grok (via X / Twitter)
Privacy Posture: 🟥 Weakest (1.3/5) Prompts are tied to X accounts and stored within the ad and social identity framework. No training opt-out or clear retention policy. Deletion removes visibility but not guaranteed server erasure.🔶 Opaque or Unverified Providers
Privacy Posture: 🟥 Weakest (1/5) Lack of public policies means assume full retention and reuse. Examples include unregulated résumé screeners, entertainment chatbots, and “AI companions” with no published governance.How We Scored
Privacy posture was determined using six equally weighted criteria:- Prompt Retention: (25%) How long data persists after use.
- Training Opt-Out: (20%) Whether user data is used to train models by default.
- Jurisdiction: (15%) Which legal framework governs the data.
- Transparency: (15%) Availability and clarity of privacy documentation.
- User Control: (15%) Ability to disable retention, delete data, or operate anonymously.
- Default Behavior: (10%) Whether privacy requires configuration or exists by design.
♦️ Key Points to Understand
- Local beats cloud. Containment equals control.
- Subscription beats advertising. Revenue model defines privacy model.
- Defaults matter. Opt-out ≠ protection.
- Voice AI ≠ private AI. Server processing still applies.
- Social integrations erase anonymity.
⚠️ Even opt-outs may not erase memory.
Independent tests show that fine-tuned LLMs can retain and regenerate sensitive data even after developers disable training on user inputs (Aditya et al., 2024). True privacy depends on design, not declarations.
Using ChatGPT, Perplexity, Claude & Brave Leo More Privately
If you rely on these mainstream tools, you can still limit exposure. Use the following baseline configuration:ChatGPT (OpenAI)
- Turn off training use: Settings → Data Controls → “Improve the model for everyone” → Off.
- Disable chat history: Settings → “Chat History & Training” → Off (to prevent logging in default UI).
- Use ChatGPT Team or Enterprise if available — these plans exclude all data from training by default.
- Never feed credentials or PII into free or Plus accounts.
Perplexity AI
- Always use Private Mode (tap lock icon in top-right of interface).
- Stay signed out — anonymous sessions retain less data.
- Block cross-site trackers in browser settings to prevent analytics linkage.
- Clear query history manually if you log in for saved threads.
Claude (Anthropic)
- Disable chat history: Settings → Privacy → Uncheck “Save conversations.”
- Purge data regularly: Use the “Delete Conversations” option after each session.
- Avoid sensitive identifiers: Never include real names, addresses, or client data in prompts.
- Enterprise use: Prefer Claude via API or Anthropic Console for zero-training contracts.
Brave Leo AI
- Leo is private by default — no account required, no persistent logging.
- Session data is transient and not linked to your Brave profile or browsing history.
- Toggle Leo on/off in Brave Settings → Leo to control when it’s active.
- Use Leo for general queries; avoid inputting highly sensitive information as with any cloud AI.
🔷 ObscureIQ Top Picks
➤ Best Choice: Self-Hosted / On-Device models
You get complete control, zero external exposure.
➤ Best Cloud Option: Proton Lumo, Brave Leo AI, or Mistral Chat
You get transparent, GDPR-aligned LLMs with strong privacy defaults.
➤ Acceptable with Caution: Claude, ChatGPT, and Perplexity
You must configure. Safe only with manual hardening.
➤ Weaker Defaults: Siri, Gemini, Copilot
They are functional, not private. They retain user data.
➤ High-Risk / Avoid: xAI / Grok and opaque providers
These are identity-linked, non-transparent.
The key question you should ask about the AI systems you use: What does it remember about you?
The broader literature is clear: privacy loss in generative AI is structural. Without explicit limits on retention and training reuse, every conversation becomes potential training data.
Choose systems engineered to forget.
| Platform | Overall | Prompt Retention | Training Opt-Out | Jurisdiction | Transparency | User Control | Default Set. | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Self-Hosted / On-Device(GPT-4All, Mistral, LLaMA) | 🟩 5 / 5 | 5 | None | 5 | Full | 5 | Local / User | 5 | Full | 5 | Full | 5 |
| Lumo(Proton) | 🟩 4.7 / 5 | 4 | Minimal metadata | 5 | Default off | 5 | Switzerland | 5 | Clear | 5 | Simple toggle | 4 |
| Brave Leo AI(Brave) | 🟩 4.7 / 5 | 4 | Minimal transient data | 5 | Default opt-out | 5 | UK / EU (GDPR) | 5 | Full | 5 | Full toggle | 4 |
| Mistral Chat(Cloud) | 🟩 4.4 / 5 | 4 | Temp ≤ 30 days | 5 | Default off | 5 | France (GDPR) | 5 | Transparent | 4 | Deletion policy | 3 |
| Claude(Anthropic) | 🟨 3.5 / 5 | 3 | 30-day min. retention | 4 | Available | 3 | US / Global | 4 | Clear terms | 3 | Manual setting | 4 |
| ChatGPT(OpenAI) | 🟨 3.5 / 5 | 3 | 30-day min. logs | 4 | Available | 3 | US / Global | 4 | Dependent on tier | 3 | Manual setting | 4 |
| Perplexity AI | 🟨 3.1 / 5 | 3 | Retained by default | 3 | Partial (model-dep) | 3 | US / Global | 3 | Moderate | 4 | Private Mode | 2 |
| Siri(Apple) | 🟨 2.5 / 5 | 3 | Transcripts retained | 1 | None | 3 | US / Global | 3 | Limited audit | 2 | No toggle | 3 |
| Gemini(Google) | 🟥 2.0 / 5 | 2 | 72 h / 3 y retention | 1 | None | 3 | US / Global | 3 | Complex | 2 | Limited | 1 |
| Copilot(Microsoft) | 🟥 2.0 / 5 | 2 | Tier-based retention | 1 | None (consumer) | 3 | US / Global | 3 | Tier-dependent | 2 | Limited | 1 |
| Meta.ai(Llama / Facebook Int.) | 🟥 1.9 / 5 | 2 | Persistent, account-linked | 1 | No opt-out | 3 | US / EU hybrid | 2 | Complex, generalized | 2 | Manual deletion only | 1 |
| xAI / Grok(via X / Twitter) | 🟥 1.3 / 5 | 1 | Persistent, tied to user | 1 | None | 2 | US / Global | 1 | Minimal | 1 | None | 2 |
| DeepSeek(China-based LLM) | 🟥 1.2 / 5 | 1 | Unkn/assume persistent | 1 | None | 1 | China | 2 | Opaque | 1 | None | 1 |
| Opaque / Unverified Providers | 🟥 1.0 / 5 | 1 | Unknown | 1 | None | 1 | Unknown | 1 | None | 1 | None | 1 |
Notes
Self-hosted: User responsible for endpoint security (OS patches, disk encryption).
Lumo: Zero-access encryption. Proton cannot read conversation content.
Leo: Session data retained temporarily for continuity. Not linked to user IDs or accounts. Not used for training.
Mistral: ZDR available for API / Enterprise only.
Claude: Opt-out possible but 30-day retention.
ChatGPT: ≈ 30-day security logs (API calls retained for abuse monitoring).
Perplexity: Training Opt-Out is partial. Private Mode stops logging; data still passes to model partners. Hybrid architecture routes through OpenAI / Anthropic APIs.
Siri: Transcripts retained (server); audio opt-in only ('Improve Siri & Dictation').
Gemini: Formal training opt-out not available via Activity Control. Disabling Activity Control limits functionality (Gmail, Docs extensions).
Copilot: Enterprise plans can enable Customer Lockbox to restrict data export.
Meta.ai: Prompts and chats linked to Meta accounts; no dedicated opt-out. Training governed by Meta's general Data Policy.
xAI/Grok: Prompts tied to X account; users may delete threads but cannot disable training.
DeepSeek: China-based; no published data-handling disclosures or user controls; governed by PRC AI regulations.
Opaque et al: Résumé screeners, AI companions, marketing chatbots without privacy policies.
Last verified October 2025 | Next audit Q1 2026 | Copyright 2025, ObscureIQ. | Distribute freely with attribution.
📚 Further Reading on LLM Privacy
Miranda et al. (2024)
Preserving Privacy in Large Language Models: A Survey on Current Threats and Solutions
This survey examines how LLMs memorize and inadvertently reveal sensitive information, and explores defenses like differential privacy and machine unlearning.
📄 arXiv · Read Paper →
Liu et al. (2024)
Generative AI Model Privacy: A Survey
A broad review covering generative-AI models (LLMs, diffusion, GANs) that catalogues privacy attacks (membership inference, model inversion) and defence mechanisms.
📄 Springer · Read Paper →
Zhao & Song (2024)
Privacy-Preserving Techniques in Generative AI and LLMs: Mechanisms, Applications, and Future Directions
A review of technical solutions (federated learning, homomorphic encryption, differential privacy) tailored to LLMs and generative AI.
📄 MDPI / arXiv · Read Paper →
King et al. (2025)
User Privacy and Large Language Models: An Analysis of Frontier Developers’ Privacy Policies
Qualitative policy analysis of major LLM developers: finds many use chat data for training by default, lack clarity, retain data indefinitely.
📄 arXiv · Read Paper →
European Data Protection Board (EDPB) (2025)
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Regulatory-oriented analysis of privacy risk management for LLMs, with practical use-cases and mitigation strategies.
📄 EDPB · Read Report →
OWASP (2025)
LLM and GenAI Data Security Best Practices
Industry-focused guidance on securing LLM data, practical controls, and risk mitigation for deployment of generative-AI systems.
📄 OWASP · Read Guide →
Aditya et al. (2024)
Evaluating Privacy Leakage and Memorization Attacks on Large Language Models (LLMs) in Generative AI Applications
Empirical study analyzing how different models leak personally identifiable information (PII) via memorization, and how model size/architecture affect vulnerability.
📄 PDF · Read Paper →
Duan et al. (2024)
Uncovering Latent Memories: Assessing Data Leakage and Memorization Patterns in Large Language Models
Examines “latent memorization” — hidden data remnants that resurface in LLM outputs. Shows memorization scales with data repetition and reveals methods to detect it.
📄 PDF · Read Paper →
Panda et al. (2025)
Privacy Auditing of Large Language Models
Introduces improved audit tools for detecting privacy leakage in LLMs, revealing higher real-world exposure rates than prior tests.
📄 PDF · Read Paper →
