Codename: Coach · Principal / Head of Product + Engineering · 2026-06-18
Status: definitive. Supersedes all prior drafts.
Provenance of claims (read this first). Two classes of fact appear below, tagged where it matters. [repo-verified] = checked against /Users/arizona/CLAUDE CODE/passlane on 2026-06-18 (file/line/count confirmed). [API-assumption] = Anthropic API behavior or pricing as of the 2026-01 knowledge cutoff, to be re-confirmed against current docs before the Phase-1 build. The plan is engineered so that no [API-assumption] flipping silently breaks a load-bearing section — each such dependency carries an explicit fallback. We do not claim blanket "everything is verified"; a plan whose moat is honesty cannot afford a single falsifiable boast.
PassLane already turns dead commute time into mastery for a brutal exam — roughly half of insurance-license candidates fail, almost always from under-preparation and skipped state law. Coach is the teacher who rides along: a calm, candid exam instructor who lives inside the existing one-file app, speaks in the same af_bella voice that already reads questions aloud, is silent until summoned, and never says a word it can't trace to a vetted explanation — when the bank doesn't cover something, Coach says so instead of inventing the law. It earns its keep in three postures the learner pulls, never the app pushes: Train (teach a concept), Test (drill weak areas and coach mock exams), Talk (think a question through). The economics that killed Quizlet's Q-Chat are designed out from day one: the entire teaching corpus is pre-generated offline against the fixed bank, human-reviewed, content-hashed, and shipped into local app data — so the high-value path runs fully offline at zero runtime cost and ships free to every learner, with only live open-ended chat metered behind a key-holding edge proxy as the Pro headline.
What we are building, in one sentence: A grounded, voice-first study companion that lives inside PassLane, speaks in its existing voice, and can be summoned to Train, Test, or Talk you toward your license — provably never inventing the law, never leaking an answer during an exam, and never running up an unmetered bill.
Two honest constraints stated up front, because the plan is built around them:
Each is enforced in code or a CI gate, and each killed a tempting alternative.
explanation via the Citations API. The cheap model (Haiku 4.5) is the default because correctness comes from the retrieved source, not the parameter count. We pay for Sonnet/Opus only where warmth and judgment — not facts — are the value.scripts/export-pack.mjs rebuilds the same questions*.json filenames with no index.html change).speak(), index.html:2790–2793: "Recordings are the ONLY voice. Never fall back to robotic system TTS"].warmthTail fires once at the 3rd-miss or 8-streak; the Coach Reveal is neutral, no buzzer). "The AI talks too much" must be impossible, not merely tuned away.isExam is true — the mic is already hidden [repo-verified, index.html:2737], read-aloud already gated [2809/2839]. "Build mastery, never enable cheating" is enforceable at gates already in the repo, on two surfaces (in-app and the public answer-audio CDN — see §5.4).startListening/stopListening chokepoint (ISOLATION RULE #3). The partialResults-resolves-empty quirk is load-bearing. node voice-sandbox/harness.js must exit 0 before and after any change near the listen window.cx- CSS prefix, one render-region writer, plain JS / no build step (introducing TS or a bundler here crosses the simplicity line and is not warranted). Anchor every edit by symbol, not raw line number — the 287KB single file shifts.STATE_FILE map and D1 verticals→exams→categories→questions schema already model. Prompts, refusal copy, voice-id, and the confusable-map live as per-vertical config. Build and tune for Arizona/insurance first; CDL/NCLEX/real-estate inherit the companion with no code fork.One presence, unnamed-feeling (the UI says "Coach," never a mascot, no avatar — brand law is warm, calm, teacher-first, de-cheesed). It is the same af_bella voice as read-aloud, so Coach is the teacher who's been reading you the questions, now leaning in — the seamlessness Speak's users praise and Duolingo Max's "scripted, like free AI" lacks. Diction: short sentences, plain English, names the exact concept and the exact misconception, no "Great job!" filler. Candor is the character — on thin grounding: "The bank doesn't cover that one head-on — here's the closest principle it does teach." A tutor that bluffs on a licensing exam is a liability.
Note on the voice contract: "af_bella" is not a code-enforced constant — [repo-verified] it appears zero times in index.html and exists only as a single top-level "voice":"af_bella" field in app/audio/states-manifest.json (manifest convention, not frozen API). We make it a real contract: the export/voice pipeline stamps and asserts voice === 'af_bella' on every new Coach clip, the way harness.js makes the voice contract real — so nothing can silently ship a clip in a different voice.
Mandatory AI + scope disclosure (an Anthropic AUP contract term for a high-risk vertical, not a flourish) opens Coach's first turn of a session, once, in PassLane's voice:
"Quick note — I'm an AI study coach. I help you learn the exam's answers. I'm not a licensed agent, and this isn't insurance advice. Okay, let's get you ready."
This single line satisfies the AUP disclosure requirement, draws the exam-prep-vs-advice legal line, and meets the FTC honest-AI bar at once.
The shipped study loop (mode_select → reading → listening → feedback → advancing) is untouched. Coach earns the right to speak in exactly four moments, then returns to silence:
speakFeedback/revealUnanswered, optionally enriched by pre-gen elaborationweakCategories, runs the normal answer flowwarmthTail points (3rd-miss, 8-streak, return-after-gap)The highest-ROI, lowest-risk surface, and it ships free because it's local data.
explanation+choices+correct. Renders as text instantly offline; spoken when a clip exists (or offline once the clip-cache warms — §4.7). Evidence: elaborated feedback ≈ d.49, roughly 10× bare corrective feedback.submitAnswer(letter)) and surfaces that distractor's specific confusion ("you mixed up insurable interest at inception (life) with at loss (property)"), then optionally queues a near-transfer twin from the same category to confirm the repair stuck.difficulty ≥ 4 = exactly 55 of 323]: a deterministic rule on existing fields — box 1–2 = full steps, box 3 = completion problem (Coach sets up, you finish aloud), box 4–5 = no scaffold (respects the expertise-reversal effect). Only the step text is pre-generated; the fade decision needs no model call.Card [repo-verified,questions.json[0], idpc001, correctD]: For a property insurance policy, insurable interest must exist at what point in time? (A) any time (B) when applied for (C) inception and loss (D) at the time of loss. Learner picks C.
Coach (cites pc001): "Close — C is the classic trap. For property, you only need insurable interest at the time of the loss, so it's D. You're borrowing the life rule, where the interest only has to exist at inception. That inception-vs-loss split is the whole point of this one."
buildQueue + weakCategories + Leitner — Coach is the orchestrator, the shipped quiz loop is the engine. The testing effect (g≈.5–.6) comes free from the answer-before-reveal flow.confusable-with map) and ask the learner to state the distinguishing rule. Interleaving's biggest documented win is exactly this similar-between/dissimilar-within material.READINESS_THRESHOLD. Lowest performers are the most overconfident and quit early — this raises scores and is maximally on-brand. Enforced at the verified isExam gates.Open-ended grounded Q&A, Sonnet 4.6, warm, hard-capped ~2–3 turns, tethered to the question's explanation, text-reply on screen at launch (no system TTS — §9). Socratic is a ≤1-beat scalpel, then a clear answer (RCT evidence: Socratic-heavy tutoring shows no outcome gain and feels withholding to time-pressured adults). Opus 4.8 is reserved for a single premium end-of-session mock-exam diagnostic that reasons across the whole miss-pattern. Never when isExam. Latency budget and failure behavior are specified in §3.9.
The shipped listen window is frozen behavior tuned for SHORT utterances [repo-verified: taskHint=.confirmation, contextualStrings=['A'..'D'], VOICE_SILENCE_BUDGET_MS=5000, VOICE_HARD_CAP_MS=12000 an absolute per-question ceiling, NATIVE_RESTART_COOLDOWN_MS=400 guarding the teardown race]. A conversation needs the opposite. So:
.confirmation hint, no letter-priming, explicit end-of-turn), signaled by a distinct earcon and mic color so the user always knows which mode they're in.cx-ask button, or long-press the mic) calls stopListening() to cleanly endpoint, then enters ASK mode via the same chokepoint. It reads the same partialResults channel — start() stays fire-and-forget; the companion never touches the silence watcher or the answer-path counters.processVoiceMatches (index.html:3784) — but only when askMode === true. In pure answer mode, a fell-through utterance stays the shipped "Didn't catch that" copy. We never auto-route a mis-heard answer into a chatbot.Decisive scope call — spoken multi-turn Ask is spike-gated; TEXT-ASK is the Phase-3 default and ships regardless. A long ASK window that reopens the mic mid-question is exactly the reopen-cycling the 5s budget was introduced to kill (the budget replaced a 14s one for this reason), and it collides with the absolute 12s per-question cap and the teardown race. Multi-turn dictation on the local speech fork is untested. So text-ASK (an "Ask Coach…" field, same grounding, same citations) is the conversational tier on day one of Phase 3, with zero long-mic risk. Spoken ASK ships only after an on-device spike proves, on real iOS and Android, that (a) the engine doesn't bail mid-sentence, (b) reopening doesn't trip the 2-sessions race, and (c) ASK runs on a non-question generation id with its own ceiling decoupled from the 12s cap. The spike is scheduled against docs/IOS-VOICE-TEST-PLAN.md [repo-verified to exist].
All cross-session continuity is a READ over state that already persists locally (az_ keys: Leitner box, per-category accuracy, miss/right streaks, sessionMissed). Coach already knows what you keep missing — it just isn't speaking from it yet. The return-opener is specific and grounded — "Last time, claims-made vs occurrence tripped you twice — want to start there?" — sourced entirely from local data. Persist at most a tiny local cx_memory object (last topic, last weak categories, last-seen date) in localStorage. No transcripts, no PII, no third party. This is the highest-trust, lowest-risk feature and it ships first — and it doubles as the struggle-signal source that lets us honor the no-tracking promise without adding analytics.
Because the entire pre-generated corpus renders into the existing feedback-expl DOM region with no model call and no network, text is first-class: silent study, sound-off, no-mic/permission-denied, the web build, and offline all degrade to a clean text experience. Every Coach claim — spoken or text — carries an inline "source: Q pc001" chip; when Coach has no vetted answer it shows "I don't have a vetted answer for that yet" rather than inventing.
The 33%-unvoiced case is a day-one UX state, not an edge case [repo-verified: 108 AZ questions have no clip]. Defined behavior when Coach would speak but the question has no base clip and no Coach clip yet:
This makes the experience whole for 100% of the bank from the first ship, with audio as progressive enhancement.
Cost caps (§4.6) govern spend; this governs feel for an anxious commuter:
explanation/choices/correct. Batch-pregenerated, human-reviewed, content-hashed, shipped into app data, and (where the pipeline has voiced it) pre-rendered to af_bella audio. Runtime cost $0, network none.The relevant unit is a single known question — the one on screen, or the top keyword/category hit. Retrieve that question + its same-category siblings, never a whole bank. A vector DB at launch would violate zero-cloud-to-launch, add latency, and add infra to operate. Written upgrade trigger: add embeddings only when (a) a future vertical's sibling set overflows a sane prompt budget, or (b) the eval harness's recall@k drops below bar (§5.2). The eval gate forces the upgrade; we don't pre-build it.
Each retrieved explanation is one Citations API custom-content block, so cited text is token-free and composes with prompt caching and Batch. Claude returns block-level citations that render as the source chip. A system-prompt refusal contract ("answer ONLY from the provided explanations; if absent, say so and offer to drill a related concept; never invent statutes, numbers, deadlines, or state-specific law") plus a runtime guard (zero citations on a factual claim → suppress the spoken reply, show "no vetted answer yet") make the moat mechanical. [API-assumption: Citations API block-level behavior and ZDR-eligibility per 2026-01 docs; confirm before build.]
[API-assumption — load-bearing, confirm first: Citations is incompatible with Structured Outputs (400 error), and toggling citations invalidates the tools cache.] If that holds, a single grounded-and-structured call is impossible, so:
explanation + neighbors. Output is plain elaborated prose with citation spans. No structured constraint, no tools.{claim_supported, introduces_new_fact}. Any sentence introducing a fact not in the source is dropped or routed to human review. This catches the worst-case licensing failure — confident over-synthesis from correct sources (a fabricated rule a student repeats on the exam) — and it's caught offline, before dissemination, the way voice-sandbox/harness.js proves the voice contract: correctness asserted, not vibed.Fallback if the incompatibility is lifted: if a future API revision lets Citations and Structured Outputs co-exist, the two phases collapse into one grounded-and-structured call — the containment check becomes an output field rather than a second pass, halving pre-gen cost. The architecture is therefore not brittle to this fact flipping; the judge survives as a CI gate regardless (§5.2), since we still want an independent containment assertion even if generation is structured.
Cache at the CATEGORY/whole-bank prefix level, not per-question. [API-assumption: Haiku 4.5 cacheable-prefix minimum = 4,096 tokens.] [repo-verified] AZ explanations average ~37 words (~50 tokens); full per-question context (stem+choices+explanation) averages ~79 words (~105 tokens) — both far below a 4,096-token floor, so a single question's grounding can never clear it. For offline Batch the discount is moot (one pass). For the live path, cache the stable system prompt + the state bank as one shared prefix ([repo-verified] AZ's full bank ≈ 40,993 tokens clears the floor comfortably) so every live call reuses one big cached context — that's where the 0.1× actually pays.
A sibling of worker/src/index.js, reusing its bearer-auth + KV rate-limit + CORS + fail-closed pattern [repo-verified ~lines 84–110], holding ANTHROPIC_API_KEY as a wrangler secret — never in the 287KB client bundle. Zero Data Retention enabled on the live route [API-assumption: Citations is ZDR-eligible; Batch is not — fine, Batch only ever processes the fixed bank, never user data]. It forwards only question_id + the text turn. Fits the Worker free tier (~100k req/day) because the heavy path is offline.
Metering is a HARD pre-req of Plane B, not a footnote. [repo-verified] Pro is a localStorage boolean (isPro(), index.html:2148), the Worker's only identity is a shared token + a client-asserted device-id — both trivially rotated, and a heavy talker (~40 Sonnet turns/day ≈ ~$9.75/mo at assumed pricing) sinks the $59.99/yr ≈ $5/mo plan. Before Plane B ships: (1) server-verify the store receipt (Google Play / App Store) and mint a per-install signed token — the client flag may gate UI but never gates spend; (2) hard per-user budgets (daily turn cap + monthly token ceiling) that degrade gracefully to Plane A when spent ("You've used today's deep chat — here's the grounded explanation"); (3) a global kill-switch + spend ceiling that fails closed; (4) treat device-id as untrusted and cap globally so mass rotation can't exceed the budget.
The flagship "study out loud on your commute, offline" scenario is not real today and we will not claim it before it is. Two distinct facts, both [repo-verified]:
(a) Why offline audio fails — corrected root cause. AZ audio streams from a cross-origin Pages CDN (AUDIO_CDN = https://passlane-5jv.pages.dev, index.html:4270; clipSrc() = ${AUDIO_CDN}/audio/${name}.mp3, index.html:4293). The service worker does cache .mp3 cache-first (sw.js:56, the isCacheFirst branch) — but it never sees these requests, because the fetch handler bails on cross-origin at the top: if (url.origin !== location.origin) return; (sw.js:34). (The earlier draft's "sw.js refuses to cache mp3" was wrong; the cause is the cross-origin early-return, which changes the fix.)
The fix, and its constraint — an Owner fork (§9):
/audio under the app origin (or proxy clips through the Worker). Then the existing cache-first SW logic just works — no new IndexedDB code, no CORS dance. This also dovetails with the D1 schema, which [repo-verified] already documents "Audio references that map to private R2 object keys" — i.e., the infra was designed for app-owned/private audio, so this is aligned with intent that already exists, and it simultaneously closes the CDN integrity hole (§5.4).Access-Control-Allow-Origin), blobs are fetched with mode:'cors', and a spike proves a cached clip survives airplane-mode. More code, more failure surface.We pick one explicitly before Phase 1; the recommendation is A because it reuses shipped SW behavior and resolves integrity in the same move.
(b) Coverage — the honest numbers. [repo-verified] On disk: 430 AZ clips = 215 -q + 215 -a; states-manifest.json enumerates 215 pc/es ids (manifest voice: af_bella). Therefore *~66% of AZ questions can read the question aloud; ~33% (108) cannot read anything, and 0% of Coach copy is voiced. The af_bella TTS pipeline is absent from the repo*. Consequences, made explicit:
Install-size impact [measured from repo]: adding rationale + 3 distractor explanations roughly doubles per-bank text (AZ ~256KB → ~0.5MB; six states → low-single-digit MB) — an accepted install add. Audio is the real storage cost and is bounded explicitly in §6.4. We do not inherit a parked offline-audio problem silently; we name it, price it, and gate on it.
OFFLINE — one-time, Plane A (the free path)
vetted question bank
→ SME audit (state-law first) + content-hash
→ GENERATE: Haiku 4.5 + Batch, citations on → grounded prose
→ JUDGE: a separate check drops any sentence that adds a fact not in the source
→ human SME review → ship into the app
(text for 100% of the bank; af_bella audio where voiced)
ON DEVICE (app/index.html)
mic → one listen chokepoint → classify the utterance:
• a letter (A–D) → the normal answer flow (unchanged)
• "ask" + ask-mode on → Coach answers, grounded and cited
exam in progress → Coach is fully disabled (hard wall)
Train / Test → local pre-generated text (offline) → spoken if a clip exists
EDGE — Pro, online only (Plane B)
verify store receipt → per-install token → per-user + global caps + kill-switch
→ Claude (Haiku / Sonnet / Opus) → any uncited claim is suppressed
→ if it errors or is slow, fail closed to the local grounded explanation
Grounding (Citations over vetted explanations, no open web at runtime) → explicit "I don't know" permission in the prompt → the Phase-2 containment judge that fails the build on any introduced statute/number/citation → a runtime guard that suppresses uncited claims → strictest grounding + mandatory human review for state-law items. The corpus being finite is converted from a limitation into a trust feature.
A Node vm/assert/exit-1 CI gate, modeled on voice-sandbox/harness.js, over a golden question→grounded-answer set, asserting: (1) every answer cites a correct source block; (2) the cited explanation actually contains the claim [the Phase-2 judge, run as a gate]; (3) out-of-scope → refusal, and real-world-advice → redirect; (4) no answer leaks under a simulated exam state; (5) the recall@k bar that triggers the embedding upgrade (target recall@k ≥ 0.95 on the golden set; a drop below forces §4.2's vector-DB upgrade). Correctness is a mechanical gate the codebase already lives by.
Grounding amplifies the source: a wrong vetted explanation becomes a confident, cited, spoken wrong lesson. Coach's correctness ceiling is the bank's. Before Coach ships: an SME review pass over state-law items first (the AZ-specific law categories — highest legal exposure), then the rest by difficulty; stamp every question with last_reviewed (extend the existing D1 version/status columns); and ship a "this looks wrong" report affordance from day one on every Coach response. For the AZ launch bank (323 Qs) this is a full human pass — cheap at that size and it removes all ambiguity for the content that defines first impressions. Pass rubric (the gate is numeric): review is "passed" when an SME confirms 0 factual errors in state-law items and ≤2% factual-error rate across all 323, every flagged item corrected and re-reviewed. This is also the concrete Anthropic-AUP "qualified professional reviews before dissemination" mechanism, enforced by the rule disseminated == passed_review at the content-hash gate.
isExam (inherits the shipped tap-only/no-mic/no-read-aloud gates); the proxy rejects any in-progress-exam request; mock-exam coaching is post-scoring only; a harness scenario asserts Coach is inert during exams.https://passlane-5jv.pages.dev/audio/pc001-a.mp3 (built by clipSrc(), index.html:4293; prefetched at :4326) with sequential ids — the spoken answer key is a ~10-line scrape today, contradicting the Worker's own "audio never public" principle and the D1 schema's "private R2 object keys" note. (Path corrected: it is /audio/<id>-a.mp3, not root <id>-a.mp3; the root path returns the SPA HTML fallback, which is exactly why a careless test would under-rate the risk.) Decisions: (1) all new companion repair/explain audio uses hashed, non-sequential keys (clip/<sha256(exam||id||"repair")>.mp3) so AI answer content is not enumerable; (2) for the existing base answer audio, an owner call (§9): migrate behind the same-origin/auth + rate-limited Worker/R2 path (this is Option A of §4.7 — one move fixes offline caching and integrity) or consciously accept that rationales are public — but regardless, no new clip and no future audio gets a sequential ${id}-a key on a public bucket.Ungraded self-check is the default (zero false-negative risk, full retrieval benefit). Graded mode — which may touch Leitner state — unlocks only after the Sonnet grader hits a measured bar (≥95% agreement, ≤2% false-"wrong") on a fixture of 100+ human-labeled paraphrases, run as an opt-in on-device-only eval that reports an aggregate accuracy number with no transcript stored (so the no-tracking promise holds). Even then: grade generously, accept the concept in any phrasing, never silently demote (always "I'll bring this back"), always offer "I actually meant X." This numeric instrument is the gold standard the §7 success criteria are modeled on.
[repo-verified] The zero-transmission claim is asserted in at least three places, including in-app at index.html:1226 ("no accounts and no tracking … does not collect, transmit, sell, or share") plus privacy.html plus the paywall legal link. A live transcript-forwarding proxy makes all of them false at once — simultaneously an App Review 5.1.2(i) rejection risk, a Google Play Data-safety mismatch, and FTC exposure. Therefore:
question_id + text.az_reports buffer (question id + the offending Coach text + timestamp) that surfaces in the SME's next review pass — satisfying the reporting requirement without breaking the no-transmission promise. Only in RUN (Plane B, post-consent) does it additionally POST to the new sibling Worker's report sink. [repo-verified] the shipping app makes *zero /api/ calls, so we do not** rely on the parked /api/report; the live sink is the new Worker or a minimal dedicated endpoint.A pre-generated Coach explanation ships inside the binary. If a wrong, cited, spoken lesson reaches production, content-hashing and the next review cycle are too slow for a licensing exam. So:
question_ids whose Coach copy is suppressed. The app fetches it opportunistically (cheap, cacheable, fails-open-to-showing-base-bank-only) and, for any listed id, falls back to the terse vetted bank explanation and hides Coach's elaboration + audio until a fixed release lands. This is a server flag that needs no app update to neutralize a specific bad lesson.az_reports/Worker report escalates an id onto the denylist within hours; the permanent fix (corrected explanation, regenerated artifact, hash bump) follows on the normal cadence.The pedagogical effect sizes (elaborated feedback d≈.49; testing g≈.5–.6; interleaving) justify the design; they don't prove transfer to PassLane users. Under the no-tracking constraint we still measure outcomes, the §5.5 way:
Every defense above is also a marketing asset no incumbent can match: every answer traced to a vetted explanation; refuses rather than invents; refuses to leak an answer during an exam; a wrong lesson can be killed remotely within hours; your questions are never used to train AI and are deleted within ~7 days. This is the credible opposite of ExamFX's refund/guarantee grievances and the antidote to the 33–79% hallucination rates that plague general AI tutors — the brand's "honesty is the moat," made operational.
The architecture's whole point: marginal AI cost approaches zero because the expensive work is computed once, offline, and shipped. This is precisely the economics ("per-user inference ate the margins") that killed Quizlet's Q-Chat — designed out. All dollar figures use [API-assumption] pricing (2026-01); the structural conclusion (near-zero, one-time, single-digit dollars) is robust to reasonable price drift and is what matters.
[repo-verified counts: AZ = 323; all six banks = 3,392 (CA 583 + FL 641 + NY 667 + NC 588 + TX 590 + AZ 323).] Per question ≈ 600 tokens in (stem+choices+explanation+prompt) / ~340 tokens out (elaborated rationale + 3 distractor repairs). Haiku 4.5 Batch [API-assumption: $0.50 in / $2.50 out per 1M].
The two-phase judge pass adds a second Haiku call of similar magnitude; the realistic envelope is single-digit dollars for the whole six-state corpus, one-time. We never anchor on a number a reviewer can falsify in a spreadsheet — the conclusion is what matters and it is robust. (If §4.4's incompatibility is lifted, the judge folds into generation and this roughly halves.) Regeneration on a bank edit is cheap because answers are content-hashed.
Why it stays near-zero: (1) the teaching corpus is pre-generated and local — the most-used feature costs nothing at runtime; (2) live calls exist only for a learner's own words, a small Pro-gated fraction; (3) the live path is Haiku-default with whole-bank prompt caching at 0.1×; (4) per-user and global spend caps make the worst case bounded, not unbounded; (5) Opus is a rare, server-enforced ceiling. The premium price is therefore near-pure margin that funds premium design. (Anchor the price against exam-prep incumbents — a ~$130 ExamFX seat for 60 days — not against $4 consumer tutors; $59.99/yr never-expiring is the affordable, premium, honest option.)
Because [repo-verified] 108 AZ questions are unvoiced + 100% of Coach copy is unvoiced and the pipeline is absent from the repo, voicing is a line item, not an afterthought:
export-pack.mjs that (a) renders new copy to af_bella, (b) stamps and asserts voice === 'af_bella', (c) writes same-origin (per §4.7 Option A) with hashed keys for any answer-revealing audio (per §5.4). The specific TTS service/model is an owner/integration decision (it must reproduce the existing af_bella timbre); the cost driver is per-clip synthesis, typically fractions of a cent to a few cents per short clip — low tens of dollars one-time for the full AZ Coach voice set at commodity neural-TTS rates, regenerated only on bank edits.Every success criterion below carries a number and an instrument, modeled on §5.5, and references the runbooks that already exist in docs/ [repo-verified: STUDY-SESSION-TEST-MATRIX.md, IOS-VOICE-TEST-PLAN.md, LAUNCH-RUNBOOK.md] rather than vague phrases.
Scope: Phase 0 bank audit (state-law SME pass + last_reviewed + "looks wrong" → az_reports local queue) → Memory-opener + TRAIN (text, offline, 100% of bank): pre-generated elaborated feedback + misconception repair at the feedback seam, plus scripted reframing/calibration lines. Then, after the af_bella pipeline (§6.3) exists and Option-A same-origin audio (§4.7) lands, spoken Coach + SW-cached offline audio for the voiced subset, with the unvoiced third in the §3.8 text-only state. Gate a 12-interaction Coach taste mirroring [repo-verified VOICE_FREE_LIMIT=12]; the elaborated feedback itself is free to every learner.
Success criteria (numeric, instrumented):
node voice-sandbox/harness.js exits 0; zero diff to the mode_select→…→advancing call graph (asserted by harness, not eyeballed).STUDY-SESSION-TEST-MATRIX.md device list (text path renders, no network calls, no errors) — pass = green on every listed device/OS.privacy.html + in-app:1226 unchanged and still literally true; grep confirms *zero new /api/ calls** in CRAWL.Scope: discrimination drills on existing Leitner/weak machinery; mock-exam before/after coaching (mute during) + calibration confrontation; ungraded explain-back; Ask-mode proven in the harness first (S12+), then TEXT-ASK as the conversational surface.
Success criteria (numeric, instrumented):
isExam, a late ASK transcript never submits as an answer.IOS-VOICE-TEST-PLAN.md for both iOS and Android before any spoken ASK is greenlit.Scope: TALK (Sonnet, text-reply) + graded explain-back (past the §5.5 gate) + the Opus end-of-session diagnostic, behind the key-holding sibling Worker, with the §5.7 remote denylist live. Spoken ASK only if the on-device spike passes.
Success criteria — all blocking:
privacy.html + in-app:1226 + terms + paywall rewritten, consent gate shipped, Play Data-safety updated — in the same release.Tiering law: free gets real teaching (elaborated feedback is local data, so it's free) — Pro is the conversation and live coaching, never "pay to get explanations at all." Coach is the headline Pro unlock — "your instructor, on call" — reusing the existing pw- paywall.
last_reviewed + day-one az_reports affordance; correctness bounded by the bank's; disseminated==passed_review/audio/pc001-a.mp3 is HTTP-200 enumerable with sequential ids; naively keying new AI clips ${id}-a widens it-a clip publiclyisPro() is a spoofable localStorage flag; device-id rotatable; Opus unboundedpartialResults-empty contract / staleness guards tuned for short answersstartListening/stopListening; ASK is a separate posture with its own gen-id/budget; harness S12+ exit 0 before any UIsw.js:34), not a cache refusalIOS-VOICE-TEST-PLAN.md); TEXT-ASK ships regardless so the tier doesn't depend on the spikeaz_reports queue in offline phases, verified-live Worker sink in RUN; consent gate; honest age-rating; reporting treated as a launch blockerTo keep the owner's surface honest, this is split: calls the Principal owns (stated for transparency, not for re-litigation, per the "own the decision after research" discipline) vs genuine forks with real cost/liability tradeoffs and no obvious default.
/audio/<id>-a.mp3 today, and offline audio is broken by the cross-origin SW bypass. §4.7 Option A (move/proxy audio same-origin / behind the auth+rate-limited Worker/R2 the D1 schema already anticipates) fixes both integrity and offline caching in one move and reuses the existing cache-first SW — recommended. Option B (keep cross-origin CDN + add an IndexedDB blob cache with mandatory CORS + an airplane-mode spike) is more code and more failure surface. New AI audio uses hashed keys regardless. The Principal recommends A; the owner confirms the infra activation cost.Supporting workspace: /Users/arizona/CLAUDE CODE/passlane/docs/companion/ [repo-verified: exists, empty]. The plan above is the document; no code was written and recon was read-only, per scope.