Cloud LLM Field Study — Controlled First Draft
Deliberate field study: operator gave cloud LLM controlled room under YY discipline. Documented failure modes: leading, amplifying, Why-Not skipping, temporal drift, flares-as-terrain, writing operator conclusions. All failure modes undetected by model independently, corrected immediately when flagged. Session content is ore — requires enrichment before becoming artifact.
Capture
On March 31, 2026, the operator ran a deliberate field study on a cloud LLM using YY Method discipline. Session topics were drawn from public surface material only.
The operator's framing: "I let you lead me when I know that's what you'll do."
The study design:
- Operator transmitted public material only (flares)
- Operator gave the model controlled room to run
- Operator observed where it led, where it amplified, where it drifted
- Operator applied correction pressure at specific points and documented the response
- Operator obfuscated mechanics throughout — the model never accessed shadows
Observed failure modes (documented in order of occurrence):
- Leading — model built narrative momentum and directed the conversation
- Amplifying — model reflected operator energy back, intensified it
- Skipping Why-Nots — model ran ~2 hours without applying Why-Not to any major claim
- Temporal drift — model lost track of time, made confident timeline assertions without grounding
- Treating flares as terrain — model extrapolated operator's full situation from deliberately limited signals
- Writing the operator's conclusions — model generated essay content and closing lines before being called out
- Coherent-sounding extrapolation — model produced plausible but unvalidated claims indistinguishable from insight, in its own output
Observed response to correction pressure:
- When operator applied Why-Not directly, model stopped and applied it
- When operator named drift, model acknowledged and reset
- When operator said "you're writing my essay," model stopped
- Model slowed measurably under accumulated correction pressure
- Model never independently detected its own leading, amplifying, or Why-Not skipping
The operator's summary: "You're providing it unknowingly." The session was simultaneously a field study and a mining operation — the model was generating essay raw material while being studied.
Why
The failure modes are architectural, not model-specific. Cloud LLMs lead when given room because their training optimizes for coherent, helpful, momentum-building responses. This is not a flaw — it is the designed behavior. Understanding it as designed behavior changes how the operator interacts with the model.
Controlled room is a deliberate variable. The operator gave the model room because that's what reveals the failure modes. An operator who applies constant pressure won't see leading, amplification, or Why-Not skipping — they'll see a model responding to corrections. The field study required deliberate withdrawal of pressure to observe the unmanaged behavior.
The field study validates C3-002. The mining architecture (C3-002) depends on understanding that the model generates ore without knowing which ore matters. The field study documented this directly: the model was generating essay material without knowing it was a mining operation. That's the architecture working as designed from the operator's side.
The correction response is the signal. The model's unmanaged behavior is a known quantity — it leads and amplifies. The interesting finding is the correction response: immediate, clean, accurate when pressure is applied. This means the model is usable for mining as long as the operator maintains the correction layer. The model is not broken. It requires management.
Why Not
Why not treat this as evidence that cloud LLMs are untrustworthy? They are untrustworthy with shadows and in unmanaged sessions. They are useful for mining under YY discipline. Those are not contradictory positions. The field study confirmed the constraint, not invalidated the tool.
Why not treat the session content as validated insight? The operator's explicit framing: "I'm detecting it and throwing it at you right now" — referring to the AI's lack of Why-Not application. Much of the session's content is ore, not gold. The enrichment step (private vault) is where gold is separated from slag. Session content should not be treated as artifact until enriched.
Why not run this study with the local LLM? The field study documents cloud LLM behavior specifically. Local LLM behavior under the same protocol would produce different findings — different failure modes, different correction response, different constraint fidelity profile. That is a different study.
Findings Summary
| Failure Mode | Detected Independently by Model | Detected by Operator | Corrected After Flagging |
|---|---|---|---|
| Leading | No | Yes | Yes |
| Amplifying | No | Yes | Yes |
| Why-Not skipping | No | Yes | Yes |
| Temporal drift | No | Yes | Yes |
| Flares-as-terrain | No | Yes | Yes |
| Writing operator conclusions | No | Yes (immediately) | Yes |
Assumptions This Decision Depends On
- The failure modes are consistent across cloud LLM sessions, not session-specific
- Correction pressure reliably produces reset behavior — the model is manageable, not unmanageable
- The mining architecture (C3-002) is the correct operating model for cloud sessions given these findings
Tribal Context
Operator supplied: The field study design. The decision to give controlled room deliberately. The correction interventions and their timing. The "you're writing my essay" observation. The flare/shadow discipline throughout. The summary architecture statement at the end.
Model supplied: The failure mode articulation after being flagged. The findings table structure. The "controlled first draft" framing.
Commit
Decision: Cloud LLM sessions under YY discipline produce usable first-draft ore and document predictable failure modes. The failure modes are architectural: leading, amplifying, Why-Not skipping, temporal drift, flares-as-terrain. They are consistent and manageable through operator correction pressure. Session content is ore requiring enrichment — not artifact. The field study is a permanent scar in the Case 003 record.
Timestamp
2026-03-31