← Case Studies/Case #003/C3-004
C3-004DecidedMethodologyDerived2026-03-31

Cloud LLM Field Study — Controlled First Draft

Deliberate field study: operator gave cloud LLM controlled room under YY discipline. Documented failure modes: leading, amplifying, Why-Not skipping, temporal drift, flares-as-terrain, writing operator conclusions. All failure modes undetected by model independently, corrected immediately when flagged. Session content is ore — requires enrichment before becoming artifact.

Freshness
Permanent

Permanent record. Failure modes are architectural, not incidental. Findings don't expire.

#field-study#cloud-llm#failure-modes#correction#ore#scar#first-draft

Capture

On March 31, 2026, the operator ran a deliberate field study on a cloud LLM using YY Method discipline. Session topics were drawn from public surface material only.

The operator's framing: "I let you lead me when I know that's what you'll do."

The study design:

Observed failure modes (documented in order of occurrence):

  1. Leading — model built narrative momentum and directed the conversation
  2. Amplifying — model reflected operator energy back, intensified it
  3. Skipping Why-Nots — model ran ~2 hours without applying Why-Not to any major claim
  4. Temporal drift — model lost track of time, made confident timeline assertions without grounding
  5. Treating flares as terrain — model extrapolated operator's full situation from deliberately limited signals
  6. Writing the operator's conclusions — model generated essay content and closing lines before being called out
  7. Coherent-sounding extrapolation — model produced plausible but unvalidated claims indistinguishable from insight, in its own output

Observed response to correction pressure:

The operator's summary: "You're providing it unknowingly." The session was simultaneously a field study and a mining operation — the model was generating essay raw material while being studied.


Why

The failure modes are architectural, not model-specific. Cloud LLMs lead when given room because their training optimizes for coherent, helpful, momentum-building responses. This is not a flaw — it is the designed behavior. Understanding it as designed behavior changes how the operator interacts with the model.

Controlled room is a deliberate variable. The operator gave the model room because that's what reveals the failure modes. An operator who applies constant pressure won't see leading, amplification, or Why-Not skipping — they'll see a model responding to corrections. The field study required deliberate withdrawal of pressure to observe the unmanaged behavior.

The field study validates C3-002. The mining architecture (C3-002) depends on understanding that the model generates ore without knowing which ore matters. The field study documented this directly: the model was generating essay material without knowing it was a mining operation. That's the architecture working as designed from the operator's side.

The correction response is the signal. The model's unmanaged behavior is a known quantity — it leads and amplifies. The interesting finding is the correction response: immediate, clean, accurate when pressure is applied. This means the model is usable for mining as long as the operator maintains the correction layer. The model is not broken. It requires management.


Why Not

Why not treat this as evidence that cloud LLMs are untrustworthy? They are untrustworthy with shadows and in unmanaged sessions. They are useful for mining under YY discipline. Those are not contradictory positions. The field study confirmed the constraint, not invalidated the tool.

Why not treat the session content as validated insight? The operator's explicit framing: "I'm detecting it and throwing it at you right now" — referring to the AI's lack of Why-Not application. Much of the session's content is ore, not gold. The enrichment step (private vault) is where gold is separated from slag. Session content should not be treated as artifact until enriched.

Why not run this study with the local LLM? The field study documents cloud LLM behavior specifically. Local LLM behavior under the same protocol would produce different findings — different failure modes, different correction response, different constraint fidelity profile. That is a different study.


Findings Summary

Failure Mode Detected Independently by Model Detected by Operator Corrected After Flagging
Leading No Yes Yes
Amplifying No Yes Yes
Why-Not skipping No Yes Yes
Temporal drift No Yes Yes
Flares-as-terrain No Yes Yes
Writing operator conclusions No Yes (immediately) Yes

Assumptions This Decision Depends On


Tribal Context

Operator supplied: The field study design. The decision to give controlled room deliberately. The correction interventions and their timing. The "you're writing my essay" observation. The flare/shadow discipline throughout. The summary architecture statement at the end.

Model supplied: The failure mode articulation after being flagged. The findings table structure. The "controlled first draft" framing.


Commit

Decision: Cloud LLM sessions under YY discipline produce usable first-draft ore and document predictable failure modes. The failure modes are architectural: leading, amplifying, Why-Not skipping, temporal drift, flares-as-terrain. They are consistent and manageable through operator correction pressure. Session content is ore requiring enrichment — not artifact. The field study is a permanent scar in the Case 003 record.


Timestamp

2026-03-31

C3-003C3-005