@kulpritt

Writing

HAI Harness

A working model where AI agents are teammates — with names, roles, and something like careers. We prototyped it at a Cursor hackathon, and it's open.

@kulpritt

For a while I've been circling the same idea: an AI agent should work with you more like a colleague than a tool. Not a chat box you prompt and re-prompt until the session degrades, but a teammate with a name, a clear scope, a personality, and a role that deepens as it gains context.

Recently I got to build the prototype at a Cursor.ai community hackathon at a16z's SF office — an extension of different ideas I've been tweaking for months, built alongside my friend @10xDayDreamer, who wrote the open-source orchestration layer it runs on. It's still open, so we can keep building on it.

A team, not a tool

HAI Harness treats the work the way a design org does — as a team of specialists, not one generalist.

The base layer is sub-agents. Each owns a specific, discrete slice of the work and is made an expert in that narrow field: one for the design system, one for motion, one for accessibility, one for brand. A sub-agent doesn't try to do everything; it does one thing well, and accumulates context every time it does. That accumulation is the point — an agent that has done a job well a hundred times, with feedback on each pass, grows into a more senior version of itself, the way a person moves from junior to senior by building judgment on the job. In the prototype, that seniority is literally a label on the roster.

Above them sits a supervising agent, closer to a design manager — it runs the weekly crit, reviews what the junior agents produce, and holds the shape of the work so I don't have to micromanage every thread.

The human layer

The part that makes it feel like a team rather than a pipeline is the interface — a Slack-style workspace that sits on top of Cursor.

Cohort design team simulator — prototype screenshot showing Slack-style channels for each specialist agent
Figure 1.1 — Early prototype, hackathon build (prototype Sam added later)

Each agent gets its own channel. You brief Alex in the design-system channel the way you'd message a colleague — "build the tokens: earthy green primary, Terra Sans, shadcn Button and Card" — and Alex replies in-thread, starts the token file, and shares the homepage artboard inline as a versioned asset for reference. Motion lives in Mira's channel, accessibility in Avery's, brand in Blake's. The supervising agent, Jordan, has a design-crit channel where the junior agents' work gets reviewed. A registry down the side lists each agent and its seniority — the career arc, made visible.

Two things make the Slack metaphor more than skin. First, scoped channels keep each agent's context clean and intentional: you're talking to one specialist about one thing, not stuffing everything into a single ballooning session. Second, the work is durable and shared. Assets are versioned and exportable — out to raw SVG or Figma tokens — so when hands-on work is needed, the conversation becomes something you can open on a canvas and then hand back. And an edit-sync panel closes the loop the other way: change a color in Figma, report it back against the asset's ID, and the update propagates to whichever agents and threads depend on it. The decision doesn't get lost in a chat log — it lands somewhere the whole team reads from.

The repo is the source of truth

That last point is the actual idea underneath the UI, and it's @10xDayDreamer's design more than mine. The harness treats humans and AI as peers — both supplying the cognitive horsepower — and assumes that, left alone, neither has reliable memory. Agents behave like amnesiac interns, forgetting instructions from a hundred turns ago and hallucinating progress; humans forget why they made a decision three months back, or step on each other's work. His fix isn't more context, it's accurate context: the repository is the single source of truth, and if a decision isn't written there, it doesn't exist. Nobody leans on model memory or human memory. Every participant reads the current state and the explicit handoff files before acting, and work passes between agents through written handoffs rather than chat history.

What's built today is exactly that foundation — the management layer for memory, context isolation, and task routing. It's a strong start precisely because it's the unglamorous part: the plumbing that keeps a team of agents from forgetting, overwriting each other, or faking done.

How work moves through Cohort — a flow diagram showing work passing from You through agents Alex, Mira, Avery, Sam to Jordan, Design crit, GitHub PRs, and back to You
How work moves through Cohort — you brief the agents; their work returns through a series of review gates

What comes next

The roadmap from here (open PRs) is about turning a clever demo into something dependable.

A real team mode, where agents get isolated workspaces and strict state-machine routing — a worker is physically blocked from touching code until the planner approves the dependency graph, so several agents on the same surface stop piling up on each other.

An adversarial evaluator, splitting execution from judgment. Models are blindly confident; they'll mark a broken feature "done." A dedicated evaluator running in a sandbox does nothing but try to break the worker's output before it ships.

And self-maintaining infrastructure — background processes that compress and organize accumulated lessons while you sleep, plus CI-style hooks that halt an agent the moment it breaks an architectural rule.

Why it matters

The obvious benefit is speed, and that's real. But it's the least interesting part.

The more interesting part is that the model is built around feedback and growth. Because agents have roles, context, and a review loop, they improve along something like a career arc instead of resetting every session. And the human stops being a prompter and becomes a director — setting intent, giving feedback, deciding what good looks like here. That's a higher level of knowledge work, and it's the part that stays unmistakably ours.

It's also meant to scale. The unit of work is a teammate with a scope, so the same structure holds whether you're a solo designer running a small cohort of agents or a team coordinating many.

This is the shape of how I think we'll work soon — and since it's open, we get to keep building toward it in the open. Give the agents names. Give them roles. Then do the part that's still yours: direct the team, and help it grow.