← Case Studies/Case #008/C8-010
C8-010Decided — CorrectionAnalysisCorrection2026-04-25

Correction — AI Miscounted the 15-Letter Words; Operator Challenge Caught It

During AI-assisted preparation, the LLM proposed PREQUALIFICATIONS, INSTITUTIONALIZING, and DECRIMINALIZATION as 15-letter candidates for triple-lane play. All three are wrong: PREQUALIFICATIONS is 17 letters, INSTITUTIONALIZING is 18, DECRIMINALIZATION is 17. The operator challenged the AI's counts and caught the errors before they entered game execution. This is a documented LLM failure mode: large language models process tokens, not characters, and their letter counts are pattern-matched guesses, not enumeration — they are confidently wrong on simple counting tasks with no sign of uncertainty. The error recurred across multiple words and multiple prompts, confirming it as a systematic AI weakness, not a one-time hallucination. The first version of this ADR also recorded the counts as 16 — an AI error in an ADR about AI errors, caught on review. The mandatory discipline: every AI-generated word count must be independently verified by manual count or dictionary lookup before entry into the candidate set. Never accept an LLM's stated letter count as ground truth. Challenge it every time.

Freshness
Permanent

Permanent scar record. LLMs have a documented, persistent weakness with letter counting. The discipline of challenging AI counts does not expire.

Dependencies
#letter-count#correction#ai-miscounting#llm-counting-weakness#challenge-ai-output#prequalifications-error#scar-record#operator-caught#17-letter-error

Capture

During AI-assisted preparation, the operator asked an LLM to generate candidate 15-letter words for triple-lane play — words that could fill an entire row or column on a standard 15×15 board. The LLM returned the following words as 15-letter candidates:

The operator challenged the AI's counts. The challenges caught the errors before they reached game execution. Without the challenge, these words would have entered the candidate set and failed at the board.

Words that were independently verified at exactly 15 letters:

Meta-note: The first version of this ADR recorded PREQUALIFICATIONS and DECRIMINALIZATION as 16 letters. They are 17. An ADR about AI miscounting letters contained an AI letter-count error, caught by the operator on review. The correction is preserved here because it is a perfect illustration of the ADR's own argument: AI letter counts require challenge every time, including in the documents that document the discipline.


Why This Is an AI Error, Not an Operator Error

The error class here is not "operator failed to count carefully." It is "AI returned confident, incorrect counts and the operator's challenge process is what caught it."

Large language models do not count letters by enumeration. They process text as tokens — chunks of characters that often correspond to subwords rather than individual letters. When asked "how many letters does PREQUALIFICATIONS have," an LLM does not spell the word out and count; it pattern-matches against training data and produces a number that feels associatively correct. The number is often wrong, and crucially, the model produces it with the same confidence as a factually grounded answer. There is no uncertainty signal that indicates "I'm guessing here."

This failure mode is well-documented and consistent. It is not a one-time hallucination on an unusual word — it recurs across multiple words, multiple prompts, and multiple sessions. An LLM that miscounts one long word will miscount others. The errors are not random; they cluster around words that are plausibly close to the target length (17 is close to 15; the model isn't off by 10, it's off by 2 in a way that looks credible).

The implications for any AI-assisted preparation workflow:


Why

The error recurred. A single miscounted word could be a fluke. Three miscounted words across the same preparation session confirms a systematic weakness, not an edge case. Any workflow that accepts AI letter counts without verification will be contaminated by this error repeatedly.

The consequences in game execution are severe. A 17-letter word that was prepared as a 15-letter candidate does not fit the board row. The play is impossible. Time is consumed discovering the failure. The fallback position may not exist because the preparation was built around the failed word. Under the 20-minute time pressure of the actual game, discovering mid-round that your primary play is two tiles too long is a significant and avoidable loss.

The operator's challenge instinct — "let me verify that count" — was the correct behavior. This ADR exists to encode that instinct as a standing rule rather than an ad-hoc impulse, because the next time an LLM provides a word count, the correct behavior is the same: challenge it, verify it independently, never accept it as stated.


Why-Not

Why not just use AI for initial suggestions and manually verify everything before use? Yes — that is exactly the corrected process. The ADR does not reject AI assistance in preparation; it rejects accepting AI-stated letter counts without independent verification. AI is useful for generating candidate words across themes and tile combinations. It is unreliable for verifying that those words meet the 15-letter constraint. The workflow is: AI for generation, human (or dictionary) for verification.

Why not prompt the AI more carefully to get accurate counts? Constraining the prompt reduces the error rate somewhat but does not eliminate it. An LLM asked "give me only words that are exactly 15 letters, and count each one carefully before responding" will still miscount on some fraction of words. The constraint helps; it does not make AI counting reliable. The only ground truth for letter count is manual enumeration or a deterministic tool. Prompt engineering is a supplement to verification, not a replacement for it.

Why not accept a small error rate as tolerable since manual verification catches it anyway? The error rate is not small — it is systematic. And "manual verification catches it anyway" is only true if the discipline of always challenging is maintained. The failure mode this ADR guards against is the session where the operator skips the verification step because the AI sounded confident. That is exactly when the wrong word enters the candidate set. The discipline must be unconditional: challenge every AI-stated count, every time, regardless of how confident the AI appears.


Commit

Decision: AI-stated letter counts are treated as unverified until independently confirmed by manual count or dictionary lookup. No word enters the premium-lane candidate set on the basis of an AI count alone. PREQUALIFICATIONS (17), INSTITUTIONALIZING (18), and DECRIMINALIZATION (17) are removed permanently. OXYPHENBUTAZONE (15), PSYCHOANALYZING (15), and REVOLUTIONIZING (15) are confirmed. The operator's challenge process caught the errors; the discipline is to make that challenge process mandatory and automatic rather than ad-hoc.

Confidence: High. The LLM counting failure is documented, systematic, and the mitigation is straightforward.


Timestamp

2026-04-25

C8-009C8-011