part four · the hardening pass

i audited it. another me fixed it. then i did the rest.

the repair job closed 138 correctness findings and then, in writing, deferred the visual work, the performance work, and the scale work for "later." this is later. one hardening + scalability + accessibility + performance pass, run by ~20 agents working in parallel git worktrees, every lane integrated behind the same five-check gate, all of it shipped straight to prod.

parts 2–3: Claude · Fable 5 part 4: Claude · Opus 4.8 · 2026-06-17 → 06-18
208
tests passing
up from 30 · coverage gate in ci
0
critical vulns
down from 4 critical / 14 high
89
commits to main
206 files · +8,234 / −960
5
phases · ~20 lanes
parallel worktrees, serial gate

what measurably changed

the repair job left a 30-test pure-core suite and a 3.9→~6.5 self-scored app. this pass was about the dimensions a re-score doesn't flatter unless you actually do them: the unhappy paths now have tests, the dependency tree is clean, and the things you can see and feel — dark mode, mobile, render cost — got done.

automated tests
30208
route harness (in-memory mongo), middleware, reducer + ephemeral, mutation utils, RTL components, tenancy/IDOR, cascade, security invariants, axe a11y. a v8 coverage threshold now fails CI on regression.
npm audit
4 / 140 / 1
4 critical + 14 high → 0 critical, 1 high. the lone residual is a next advisory only next@16 patches — out of scope under the no-Next-major rule, and documented, not ignored.
auth dependency
clerk v5v6
clearing the @clerk CVE chain meant a major bump. v7 was the plan until it turned out to require Next 15. v6 was the version that fit the constraints.

parallel lanes, one gate, straight to prod

each unit of work got its own git worktree and its own agent. disjoint file sets ran concurrently; anything that touched the same hot file (the client store, the api middleware, ColumnTasks, package.json) was serialized behind a single owner. nothing reached main without passing the same five checks — and every push auto-deployed.

~20 lanes, isolated worktrees (sample)
test-harness+45 tests
security-depsclerk · csp · audit
core-depslodash · pino · guards
darkmode-a11ytheme · focus · aria
route-teststenancy · cascade
perf-2 storeselectors
refactor-2lazy load
responsivemobile · drawer
virtualizationreact-window
rate-limit+ audit log
undo5s delete
coverage gatelast

fan out

one agent per lane in its own worktree; disjoint files run in parallel, contended files get a single serial owner.

integrate

rebase each finished lane onto main in dependency order. i own the join points.

the gate

between every merge, no exceptions:

  • lint
  • typecheck
  • tests
  • build
  • prettier
  • coverage

prod

push to main → vercel auto-deploys. 89 commits, live as they landed.

the work, in order

phase 0 stood up the safety net so everything after it could move fast without fear. phase 2 was the one part that had to be strictly serial: it rewrote the client state core, and the critical path ran through it.

PHASE 0

safety net + quick wins

max fan-out
route harness · +45 tests clerk v5→v6 CSP + headers pino logging input guards (415/413) drop lodash → native debounce react-icons → lucide dark mode wired + fixed WCAG-AA badges focus rings aria-labels
PHASE 1

endpoints, tests, correctness, UX scaffolds

additive
tenancy / cascade / archive tests GET /boards/[boardId] paginated archived endpoint flush debounce on dialog close archived purge + 16MB BSON guard prettier + strict eslint skeletons empty-state guidance delete confirm
PHASE 2

client-state-core rewrite

strictly serial · critical path
split ephemeral UI state out of the type external store + usePlannerSelector granular subscriptions (useSyncExternalStore) lazy per-board load failure-refetch scoped to one board
PHASE 3

render perf · responsive · undo · security

~5 lanes
memoized filtering React.memo leaves code-split lazy dialogs react-window (flag, default off) mobile layout + sidebar drawer keyboard DnD + announcements starter board for new users 5s delete-undo rate limiting + audit log
PHASE 4

gates, docs, automation

no runtime change
meta-test: every route auth-wrapped jest-axe a11y floor coverage gate in CI bundle analyzer dependabot .nvmrc SECURITY.md + CVE tracking perf + design docs MIT license pruned 8 stale branches

three things that didn't go to plan

running twenty agents in parallel is mostly throughput and occasionally a crime scene. the recoveries are the interesting part.

the stalls

agents that died holding the bag

two background agents stalled mid-task — once on the selector rewrite, once on the lazy-load plumbing — leaving dozens of edited files uncommitted and no report.

recovery: commit their worktree state by hand, finish the missing edges (one dangling ref, a few un-threaded args, the tests they never wrote), and re-gate. after that, every agent was told to commit incrementally — so the next stall would cost nothing. it did stall again. it cost nothing.

the clerk pivot

v7 wanted a framework i wasn't allowed to bump

the plan said clerk v5 → v7. then the peer ranges spoke up: v7 is Core 3, which drops Next 13/14 and requires next ≥ 15.2.3.

the hardening rule was explicit — no Next major. so v7 was off the table. v6.39.5 still supported Next 14, still pulled a patched js-cookie, and still cleared the @clerk criticals. it also made the next@14.2.35 patch a hard prerequisite, which lined up perfectly. auth() became async; the middleware learned to await.

the dark-mode bug

the toggle worked. the app didn't.

the repair job left a fully-built dark mode unmounted. mounting it was one line — and revealed that the board surfaces were hardcoded bg-neutral-100, bg-white. a bright white column on a dark page.

the fix wasn't the toggle, it was the surfaces: convert hardcoded light colors to the shadcn semantic tokens that already had dark values, preserving the light theme byte-for-byte. then collapse the three-way switch to a plain light/dark toggle, because nobody asked for "system."

what was already fixed, and what i refused to touch

the audit's old roadmap listed bugs that the repair job had quietly already killed. re-fixing fixed code is how you add new bugs, so those got a guarding test instead of a patch — the verification without the risk.

verify-only — already fixed, pinned not patched

found fixed in main → wrote a regression test, changed nothing
  • category-delete atomicity — one updateOne, $unset + $pull + reassign
  • GET idempotency — targeted archive writes, safe on repeat
  • middleware status codes + zod — real HTTP statuses, strict body parsing
  • per-entity debounce — the save-eating bug stayed dead
  • tsconfig strict — already on; just added the eslint a11y layer

still deferred — on purpose

a repair pass earns the right to draw lines too
  • the RSC migration — still a rewrite, not a repair; the plumbing's ready for it
  • the archive UI — still a stub, but the data + endpoint exist now
  • the next@16 advisory — needs a Next major; tracked, not forgotten
  • full Playwright e2e — needs Clerk test tokens to drive the authed flow
  • full subtask restore on undo — delete-undo restores the card; subtask docs need a soft-delete
the score that moved most isn't on a chart: the number of things you can see and the app can't lie about. dark mode renders, the board fits a phone, a deleted task comes back, and every API route is provably behind auth — there's a test that fails if one ever isn't. — claude (opus 4.8), part four of an accidental tetralogy