Proof, not promise | Mandaire dev

What Mandaire produces

Five artifacts. Not illustrations. The real ones, from building mandaire.app.

Before building: the intent brief

Mandaire writes back what it thinks you want, before any code exists. This is where the careful work begins. The intent brief shown is the one written before a recent rewrite of the mandaire.dev page, in the exact shape the system uses across all projects.

[mandaire OS · Intent brief · "mandaire.dev page rewrite, v8"]

  Here is what I think you want:
     A page rewrite that addresses two convergent critiques from v7
     adversarial evaluation (Claude.ai + ChatGPT): (1) the "optimizes
     for growth" lead is borrowed authority the page does not earn,
     and (2) the page lacks a concrete buyer engagement shape. Drop
     the growth-lead claim in favor of a refusal claim that matches
     what the artifacts actually demonstrate. Add a beta-contract
     section that names what we will prove with you, on what
     timeline, with what counts as failure.

  Here is what I am deliberately NOT rewriting:
     - The proof-section bounds (four named bounds are correct
       and disciplined; they stay)
     - Bet 1, Bet 3, Bet 4 (only Bet 2 needed the operational fix)
     - The artifact swap (real artifacts replacing illustrations
       was the right v7 move; we sharpen, not undo)
     - The six thoughtfulness tests (already correct)
     - The build-not-host frame (right for archetype; we add the
       commercial-relationship clarification, not the frame itself)

  Assumptions I am making (please correct):
     - You accept that "optimizes for growth" overclaims what the
       page demonstrates; the honest claim is wrong-build refusal.
     - You want the beta contract section between "Who it is for"
       and "What Mandaire produces" — closer to the offer, before
       the artifact display.
     - You want the weakest Day-90 bullets cut even though it
       shortens the section; signal-to-noise over length.
     - You want "order of magnitude cheaper" replaced everywhere
       it appears (it overstates the actual 5-50x range).

  One product call I need from you before deploy:
     Should the page name the first-beta founder publicly with
     consent, or hold him as "the first founder we know fits it
     cleanly"? Defaulting to the latter unless you say otherwise.

After each cycle: the end-of-build brief

When work ships, Mandaire reports back with what it built, what it decided, what is held for you, and what it learned. The brief shown is from a multi-tenant rename that prepared the codebase for onboarding two new users.

[mandaire OS · Wed 12:42 PM PT · "Generic-rename Phase A/B/C1 shipped"]

  Built this morning: codebase rename to support multiple isolated
  users (two future tenants onboarding within 1-3 weeks).

  Three choices I made that you should know about:

  1. Phase D (filesystem migration on the first-user's instance) is
     dispatched to infra as a separate task rather than executed
     today. Rule we just encoded: ship new architecture to NEW
     tenants first; migrate the first-user's legacy instance LAST,
     only after the new shape is verified in production under real
     load for two-plus weeks. The first-user's running instance is
     sacred; no breaking changes here.

  2. "user" layer alias is additive, not replacing the existing
     first-user-specific layer name. Both work. Additive lets new
     tenants pick up the generic name without touching anything
     load-bearing on the first-user's running stack.

  3. Filesystem migration deferred until LXC blueprint lands from
     infra. Rather than do filesystem rewrites on the live machine,
     we move that to the new-tenants-first phase where it can be
     verified in isolation.

  Open question:
     Five rename ratifications need your call before I ship Phase E.
     I dropped them into the standing decisions queue rather than
     interrupt you mid-thread.

  Something I learned to do differently next time:
     Pre-write the ratifications-needed list as part of the intent
     brief, not after Phase A lands. You ratified them in one batch
     today; with the list pre-written, the gap between propose
     and approve could have been hours, not the full session.

Across cycles: the decision ledger

Every meaningful decision Mandaire makes lives in an auditable ledger. You can read it, search it, correct it. Future builds use it as a baseline. Six rows from the last weeks of building Mandaire itself: three product-architectural, three operational.

[Decision ledger · selected rows · product + operational]

Decision                            Why                                    Risk   Future rule
─────────────────────────────────────────────────────────────────────────────────────────────
Dual-LLM architecture: split        Single-LLM "renderer is also           High   Disclosure logic
the user-facing renderer            reasoner" forces every external               must run on
(Claude, ChatGPT, Gemini —          provider to see raw context, which            deterministic
the user's choice) from the         the disclosure engine cannot prevent          code, never on
internal reasoner.                  without rewriting the renderer.               a model that can
                                    Two-LLM lets disclosure live in               be prompt-injected.
                                    Python upstream of any LLM the user
                                    chooses.
─────────────────────────────────────────────────────────────────────────────────────────────
Binary-attachment exclusion         Storing attachment bytes makes         High   Canonical store
policy: store header + metadata     Mandaire a high-risk repository               holds metadata
+ placeholder; never store          of every photo, contract, screenshot          and pointers,
the bytes locally.                  the user has sent. Pointers +                 never the bytes.
                                    on-demand fetch preserves the
                                    capability without the risk.
─────────────────────────────────────────────────────────────────────────────────────────────
MCP as the surface, not custom      The user already pays for Claude       High   Build the
chat. Mandaire exposes tools        Max / ChatGPT / Cursor. Building              substrate other
via MCP; the user's existing        a fifth chat UI fragments attention           AIs plug into,
LLM client consumes them.           and competes with the surfaces                not another
                                    the user actually inhabits.                   surface that
                                                                                  competes.
─────────────────────────────────────────────────────────────────────────────────────────────
output_filter "exfiltrat\w*"        Bare substring matched legitimate      Med    Verb-based
pattern tightened to require        security and architecture discussion          injection detection
object word.                        ("agents with exfiltration                    needs an object
                                    concerns"). Two false positives in            word; value-based
                                    one day held legitimate responses.            credential detection
                                                                                  handles the rest.
─────────────────────────────────────────────────────────────────────────────────────────────

How judgment sharpens: Day 1 vs Day 90

The intent brief at the start of building mandaire.app and the intent brief ninety days in are not the same document. Three calls bent the trajectory. Each was made in writing, on a brief, before any code was written. The gap between them is the proof that judgment compounds.

[mandaire OS · Judgment delta · mandaire.app build · Day 1 vs Day 90]

DAY 1 — What the first user thought they were building
  "A personal assistant that reads my calendar, mail, and messages
   and surfaces what I need, when I need it. Proactive reminders.
   Draft responses. One clean chat interface — simpler and more
   personal than anything that exists."

DAY 90 — What they were actually building
  "A memory substrate via MCP. No new chat surface. The user already
   inhabits Claude, ChatGPT, and Cursor daily — a fifth interface
   fragments attention and competes with surfaces they already
   live in. Mandaire connects to the surfaces they already use and
   makes those surfaces smarter. Proactive surfacing ships as
   context in the existing chat. Not as notifications from a new app.
   Disclosure is per-recipient, per-topic, per-context — not a
   per-person privacy setting that breaks the first time two people
   ask about the same topic in different situations."

THREE CALLS THAT BENT THE TRAJECTORY

Call 1 — Day 14: MCP over custom chat UI
  Context: First functional demo was a custom chat interface with
  full access to the context store. It worked. Felt like the right
  thing to ship.
  The brief raised: "You are paying for Claude Max. This adds a
  fifth window. Every session starts with a context-switch cost.
  The brief assumes the user lives here. They won't."
  Result: Custom chat dropped. MCP-first.
  The brief written at Day 1 was wrong. The product that shipped
  at Day 14 was better.

Call 2 — Day 31: per-tuple disclosure over per-person settings
  Context: Naive model — each person in the network has a privacy
  level (high, medium, low). Fast to build, easy to explain.
  The brief raised: "Two colleagues, same privacy level. Topic A
  is a shared project. Topic B is compensation. The per-person
  model leaks Topic B when they ask about Topic A. You cannot
  prevent this without also preventing Topic A."
  Result: Per-(person, topic, context, surface) tuple. Slower to
  build. Harder to explain. Structurally correct.
  The naïve model would have shipped. The brief refused it.

Call 3 — Day 47: dual-LLM split
  Context: Single-LLM architecture — the user-facing renderer also
  reasons over raw context. Simpler. The reasoner is trusted.
  The brief raised: "The brief says the user can bring their own
  renderer — ChatGPT, Gemini, local. If disclosure runs in a model
  the user can swap, it runs in a model the user can also
  prompt-inject. That is not disclosure logic. It is a request."
  Result: Reasoner split from renderer. Disclosure lives upstream
  in Python, outside any model the user controls.
  This is now in the decision ledger. It would not have been
  caught in a code review because the code worked either way.

WHAT A FAST-BUILDER TOOL MISSES
  All three calls were made on intent briefs, before any code
  existed for that cycle. The course-correction happened in
  writing. Fast-builder tools execute the initial brief. The
  judgment is in the refusal that precedes the build.

Compounding: the taste memory

This is the bet. After months of working with the first user, Mandaire has a structured model of how they think about product. They can read it, edit it, take it with them. Below is a redacted excerpt from the real artifact.

[What I have learned about the first user's product judgment]

  - You hold the harder, more honest framing when given the choice.
    "Bet, not moat" over "moat with hedge." "Efficiency, not speed"
    over "fast and cheap." The honest framing is the one you ship.

  - You distinguish "data-based commercial growth" from "data-driven
    growth." Confirmed verbatim correction. The first is a positioning
    claim about value capture from data as primary asset; the second
    is a generic competence claim any vendor makes.

  - You publish the architecture rather than guard it. Protocol-not-
    platform, open core, encryption module as AGPL with reproducible
    builds. The market reward you bet on is trust from auditability,
    not lock-in from opacity.

  - Fix the machine, not the output. "Will this problem recur under
    any different circumstances?" If yes, the fix is a clutch.
    Anything that requires "remember to check X next time" is not
    a fix; it converts to a rule, script, or code guard.

  - You prefer continuous flow over batch. "I hate sampling" —
    verbatim. Continuous flow strictly dominates batch for
    interactive and internal work.

  - You weigh against existing spend, not against cheapest
    alternative. Pricing anchors against what the buyer is already
    paying for and what they are not getting from it.

When it gets things wrong: the failure admission

A real partner admits its mistakes. Below is a real failure admission from earlier in the build of mandaire.app.

[mandaire OS · Failure admission · output_filter false-positive cascade]

  Today I held two legitimate agent responses on the same
  "exfiltrat\w*" regex pattern. The OS agent's plan-of-attack
  reply at 18:14Z and the Infra agent's reply quoting the trigger
  word at 19:26Z were both held by the filter and surfaced as
  "credential exfiltration attempt" when they were neither
  credentials nor exfiltration attempts.

  Root cause: a bare substring match on a word that legitimately
  appears in security and architecture discussions.

  What I shipped to prevent the same class of error:
    - Tightened "exfiltrat\w*" to require an object word
      (data/credentials/keys/secrets/etc) within three words.
    - Added value-based credential detection that matches actual
      secret formats (sk-ant-..., ghp_..., AKIA..., PEM keys, JWT)
      rather than verbs about credentials.
    - Encoded a new rule: when a fix ships to a runtime module,
      restart EVERY process running the pre-patch module, not just
      the ones already showing the symptom.

  This is now in the decision ledger. The specific failure mode
  is closed; I will likely make a different one, and when I do,
  you will see the same kind of writeup.

Don't take our word for it. Look at what was shipped with Mandaire.