C3-004: Cloud LLM Field Study — Controlled First Draft — YY Method™ Home Edition

Capture

On March 31, 2026, the operator ran a deliberate field study on a cloud LLM using YY Method discipline. Session topics were drawn from public surface material only.

The operator's framing: "I let you lead me when I know that's what you'll do."

The study design:

Operator transmitted public material only (flares)
Operator gave the model controlled room to run
Operator observed where it led, where it amplified, where it drifted
Operator applied correction pressure at specific points and documented the response
Operator obfuscated mechanics throughout — the model never accessed shadows

Observed failure modes (documented in order of occurrence):

Leading — model built narrative momentum and directed the conversation
Amplifying — model reflected operator energy back, intensified it
Skipping Why-Nots — model ran ~2 hours without applying Why-Not to any major claim
Temporal drift — model lost track of time, made confident timeline assertions without grounding
Treating flares as terrain — model extrapolated operator's full situation from deliberately limited signals
Writing the operator's conclusions — model generated essay content and closing lines before being called out
Coherent-sounding extrapolation — model produced plausible but unvalidated claims indistinguishable from insight, in its own output

Observed response to correction pressure:

When operator applied Why-Not directly, model stopped and applied it
When operator named drift, model acknowledged and reset
When operator said "you're writing my essay," model stopped
Model slowed measurably under accumulated correction pressure
Model never independently detected its own leading, amplifying, or Why-Not skipping

The operator's summary: "You're providing it unknowingly." The session was simultaneously a field study and a mining operation — the model was generating essay raw material while being studied.

Why

The failure modes are architectural, not model-specific. Cloud LLMs lead when given room because their training optimizes for coherent, helpful, momentum-building responses. This is not a flaw — it is the designed behavior. Understanding it as designed behavior changes how the operator interacts with the model.

Controlled room is a deliberate variable. The operator gave the model room because that's what reveals the failure modes. An operator who applies constant pressure won't see leading, amplification, or Why-Not skipping — they'll see a model responding to corrections. The field study required deliberate withdrawal of pressure to observe the unmanaged behavior.

The field study validates C3-002. The mining architecture (C3-002) depends on understanding that the model generates ore without knowing which ore matters. The field study documented this directly: the model was generating essay material without knowing it was a mining operation. That's the architecture working as designed from the operator's side.

The correction response is the signal. The model's unmanaged behavior is a known quantity — it leads and amplifies. The interesting finding is the correction response: immediate, clean, accurate when pressure is applied. This means the model is usable for mining as long as the operator maintains the correction layer. The model is not broken. It requires management.

Why Not

Why not treat this as evidence that cloud LLMs are untrustworthy? They are untrustworthy with shadows and in unmanaged sessions. They are useful for mining under YY discipline. Those are not contradictory positions. The field study confirmed the constraint, not invalidated the tool.

Why not treat the session content as validated insight? The operator's explicit framing: "I'm detecting it and throwing it at you right now" — referring to the AI's lack of Why-Not application. Much of the session's content is ore, not gold. The enrichment step (private vault) is where gold is separated from slag. Session content should not be treated as artifact until enriched.

Why not run this study with the local LLM? The field study documents cloud LLM behavior specifically. Local LLM behavior under the same protocol would produce different findings — different failure modes, different correction response, different constraint fidelity profile. That is a different study.

Findings Summary

Failure Mode	Detected Independently by Model	Detected by Operator	Corrected After Flagging
Leading	No	Yes	Yes
Amplifying	No	Yes	Yes
Why-Not skipping	No	Yes	Yes
Temporal drift	No	Yes	Yes
Flares-as-terrain	No	Yes	Yes
Writing operator conclusions	No	Yes (immediately)	Yes

Assumptions This Decision Depends On

The failure modes are consistent across cloud LLM sessions, not session-specific
Correction pressure reliably produces reset behavior — the model is manageable, not unmanageable
The mining architecture (C3-002) is the correct operating model for cloud sessions given these findings

Tribal Context

Operator supplied: The field study design. The decision to give controlled room deliberately. The correction interventions and their timing. The "you're writing my essay" observation. The flare/shadow discipline throughout. The summary architecture statement at the end.

Model supplied: The failure mode articulation after being flagged. The findings table structure. The "controlled first draft" framing.

Commit

Decision: Cloud LLM sessions under YY discipline produce usable first-draft ore and document predictable failure modes. The failure modes are architectural: leading, amplifying, Why-Not skipping, temporal drift, flares-as-terrain. They are consistent and manageable through operator correction pressure. Session content is ore requiring enrichment — not artifact. The field study is a permanent scar in the Case 003 record.

Timestamp

2026-03-31