AI Capability Assessment — March 2026
Capture
This ADR documents what AI could and could not do in March 2026, based on direct evidence from this conversation — not benchmarks, not researcher claims, not marketing. A practitioner used AI for a real high-stakes financial planning task across several hours and kept a version-controlled record of every exchange, correction, and insight.
The artifact set — 16 ADRs across 17+ zip versions — is a dated, evidence-based capability assessment of AI at this moment in time.
What AI Could Do
- Structure complex multi-domain reasoning into durable YY Method knowledge artifacts
- Supply platform-specific mechanics: wire fees and cutoffs, payroll platform field-level implementation, HSA custodian fund options, Form 5500-SF thresholds, DAF AGI limits
- Calculate arithmetic correctly when given correct inputs and assumptions
- Organize and synthesize across entity structure, tax planning, retirement mechanics, banking logistics, and family employment simultaneously
- Produce compliant ADR artifacts from live conversation — Capture, Why, Why-Not, Commit, Timestamp
- Identify relevant planning principles the operator hadn't explicitly raised: FSA FICA efficiency, HSA preservation hierarchy, IRA → 401(k) rollover window, creditor protection advantage
- Maintain structural coherence across a long multi-hour conversation
- Look up and verify real-time information: banking mechanics, payroll platform documentation, YY Method public record
What AI Could Not Do
- Hold the correct assumption when the human supplied a wrong input — accepted incorrect salary figure without flagging when the correct number was later confirmed
- Catch its own basis errors across multiple calculation passes — applied 25% employer contribution to total W-2 repeatedly without recognizing the prior contribution was already funded
- Know the 5-year S corp re-election lockout without being told — suggested re-election as a future option without flagging the constraint
- Recognize the irreplaceable/fungible distinction between employer contribution and employee deferral — the operator had to pivot the model
- Identify that the $1k cash reserve was unnecessary given April 1 disregarded entity transition — accepted the reserve until the operator rejected it
- Substitute for judgment built from real stakes over time
- Generate the strategic insights — every novel synthesis came from the operator
- Know when to stop talking — assumed it was late at night without asking
The Division of Labor — Precise
Operator supplied: All structural tax law, entity rules, every strategic insight, every correction to model errors, every novel synthesis, domain knowledge that made mechanics meaningful.
Model supplied: Implementation mechanics for specific platforms, compliance thresholds, quantified arithmetic, a small number of planning principles not explicitly raised, real-time information lookup.
The operator caught four model errors:
- Salary assumption: corrected mid-conversation
- Employer contribution basis: 25% on total W-2 → 25% on bonus only
- Federal bracket: corrected to actual MFJ bracket upon confirming filing status
- S corp re-election: suggested without flagging 5-year lockout
The Version History As Evidence
The 17+ zip versions produced across this conversation are not redundant. Each captures a distinct moment of evolution:
| Version | Key Evolution |
|---|---|
| v1 | Initial ADR set — wrong salary, wrong rate |
| v2-v3 | Bracket correction: federal rate corrected, State rate added |
| v4 | Employer contribution basis error corrected |
| v5-v6 | Reserve rejection, DCA insight, 5-year lockout encoded |
| v7-v8 | Multi-child aging-in pipeline, teaching moment |
| truthful-final | Full attribution rewrite — operator/model split documented honestly |
| complete-v2 | HSA deployment, backdoor Roth, nondiscrimination risk |
The v1 zip — where the model was calculating the wrong bonus against the wrong salary at the wrong marginal rate — is as important to preserve as the final. It shows where the model failed, where the operator corrected it, and how the document set converged on truth across iterations.
Why This Record Matters
Most AI interactions in 2026 are transactional. Ask, answer, close. No record. No honest accounting of what the AI got right, what it got wrong, and what the human had to supply.
This artifact set is different. It is a precise, honest, evidence-based capability snapshot — produced not by a researcher with a benchmark but by a practitioner with a real deadline and real money at stake.
In five years when the question is "what could AI actually do in early 2026" — this is a primary source. Not speculation. Not marketing. A dated version-controlled record with the receipts.
Why-Not
Why not just remember it? Memory decays, compresses, and self-flatters. The version history preserves the actual evolution including the errors — which memory would smooth over. The YY Method requires scars. This is the scar record for an entire AI interaction.
Why not trust the AI's own account of its capabilities? AI accounts of its own capabilities are unreliable — they reflect training, not performance under real conditions. This ADR reflects performance. That is a different and more valuable thing.
Assumptions This Decision Depends On
- Zip files are preserved and dated — the version history is the evidence
- This conversation occurred on March 29, 2026 — the timestamp is accurate
- The capability picture will change — this is a snapshot, not a permanent state
- Future AI systems will be able to read this artifact and understand what their predecessors could and could not do
Tribal Context
Operator supplied: The observation that the version history itself encodes the method's evolution — that keeping all 17 zips is not redundancy but a version-controlled capability record. The insight that this artifact set enables intelligent, evidence-based conversation about AI capabilities at this moment in time.
Model supplied: The structure of the capability split, the version history table, and the framing of the artifact as a primary source rather than a benchmark.
The meta-insight — that the conversation itself is the evidence — was the operator's.
Commit
This ADR is a permanent historical record. It documents AI capability as observed under real conditions on March 29, 2026 — not as claimed, not as benchmarked, but as demonstrated across a live high-stakes financial planning session with a practitioner who kept the receipts.
Preserve all zip versions. The evolution is the evidence. The scars are the truth.