Signal body
# Default Training Consent and the Two-Tiered Privacy System: Stanford's Privacy Policy Analysis
Counterpose | CP-24 | March 1, 2026
A publication of Vega Commons Project, Inc.
---
A Stanford University research team (Jennifer King, Kevin Klyman, Emily Capstick, Tiffany Saade, and Victoria Hsieh) published a systematic analysis of the privacy policies of the six largest U.S. AI developers: Amazon, Anthropic, Google, Meta, Microsoft, and OpenAI. The paper, submitted to arXiv in September 2025 and circulating widely as of March 2026, uses a methodology grounded in California Consumer Privacy Act analysis.
The central finding is that all six developers use consumer chat data to train their models by default. The paper documents the gap between user understanding, privacy policy disclosure, and actual retention and training practices. It identifies a two-tiered system in which enterprise customers receive opt-out-by-default data protection that consumer users do not.
The paper is an independent academic confirmation of the structural condition that produces what we refer to here as custody surfaces: the set of records an AI system creates during operation that can be discovered, subpoenaed, or compelled through legal process. AI interaction records (the logs of what users asked, what the system responded, and any reasoning the system performed) are retained by default, at population scale, without user understanding of what is retained or for how long.
## Findings
All six developers use user chat inputs and outputs to train their AI systems by default. Amazon, Meta, and OpenAI retain some or all chat data indefinitely. The paper documents that enterprise users are opted out of model training by default while consumer users are opted in. The practical result is that an enterprise that has not confirmed its opt-out status may believe it has enterprise-grade data protection when its actual retention posture depends on whether opt-out was explicitly configured.
Material data practices are distributed across multiple policy documents. The analysis found that primary privacy policies did not fully disclose AI data practices without reviewing additional branch documents, FAQs, and sub-policies. OpenAI required reviewing six separate policy documents to construct a complete picture. An enterprise that relies on a vendor's homepage privacy policy as its governance basis may not have a complete picture of the vendor's actual retention practices.
## Inference as a Custody Mechanism
The paper identifies a particularly consequential pathway. AI systems can infer sensitive information from chat data even when that information is not directly disclosed. The paper describes a scenario in which a user asks an AI chatbot for heart-healthy dinner recipes, the model infers a cardiovascular condition, and that classification flows through the company's ecosystem. An AI session that reveals a user's health condition through inference becomes a health-related record for legal purposes regardless of whether the user explicitly disclosed health information. The inference pathway is the mechanism by which a routine interaction becomes a sensitive record.
## Persistent Personalization Records
Several major vendors now offer persistent personalization features that retain user information across chat sessions. The paper describes systems that recall user details across conversations and store information that informs future responses. Three unresolved questions follow: whether personalization data is excluded from model training, whether it is shared across product boundaries, and whether consumers can access, correct, and delete personalization data under existing privacy frameworks.
Personalization records are a distinct class from session logs. Session logs capture inputs and outputs within a defined interaction. Personalization records capture persistent, cross-session inferences about the user that accumulate over time. A vendor-maintained cross-session profile constructed from user interactions is potentially responsive to legal process in addition to session logs.
## Children's Data
The Stanford analysis reports that multiple developers permit accounts for users aged 13 and older and do not uniformly exclude children's chat data from model training. Where minors' chat interactions are retained or used for training by default, the retained records occupy a heightened-scrutiny position under existing children's privacy frameworks.
## Privacy and Custody as Distinct Questions
As a privacy analysis, the Stanford paper addresses who can access the data, what consent was given, and whether data minimization principles are followed. The question it confirms but does not directly address is the custody question: does the record exist, who holds it, and can it be compelled? The finding that all six developers retain chat data by default confirms the predicate: records exist, they are held by identifiable custodians, and they can be compelled through legal process. The privacy exposure and the custody exposure are distinct, and the Stanford paper documents the conditions that produce both.
The question the paper leaves open is whether the two-tiered system it documents (enterprise protection by default, consumer exposure by default) represents a stable institutional arrangement or a transitional condition that regulatory action, litigation, or market pressure will eventually close.
---
## Sources
| Source | Date | Description | URL | |--------|------|-------------|-----| | King, Klyman, Capstick, Saade, Hsieh, "User Privacy and Large Language Models" | September 5, 2025 | Stanford University, arXiv:2509.05382v1 | https://arxiv.org/abs/2509.05382 | | Guri Singh (@heygurisingh) | March 1, 2026 | X.com amplification | |
---
## Amendment Log
*No amendments to date.*
---
The observations presented reflect analytical assessment of publicly available information and do not constitute legal, insurance, or investment advice. Counterpose maintains no formal relationship with any vendor, regulator, or standards body referenced in this publication.