Stop Hiring AI Agents. Start Hiring AI Employees.

Yesterday we threw out our entire agent architecture and rebuilt it in a day. The old setup had one dedicated agent per product. Herald got an agent. Nova got an agent. Thoth got an agent. Each one ran its own OpenClaw gateway, held its own context, knew its own codebase. Sounded clean on paper.

In practice, it was a staffing nightmare. When Herald’s agent sat idle waiting for API rate limits, Nova’s agent was drowning in RAG pipeline work. No way to shift capacity. No way to share load. Four specialists, each chained to their desk.

Sound familiar? It should. This is the same mistake every startup makes when they hire. You don’t hire “the Herald person” and “the Nova person.” You hire engineers, and you assign them work based on priority.

The Staffing Agency Model

We replaced dedicated agents with a worker pool. Four generalist workers plus one ops agent, running as systemd services on ports 18790 through 18793. Each worker is an identical OpenClaw gateway instance. No specialization baked in. They’re blank slates.

The specialization comes from personality profiles. Five YAML files sitting in a shared config directory: researcher.yaml, coder.yaml, architect.yaml, ops.yaml, writer.yaml. When the orchestrator assigns a task, it loads the appropriate profile into the worker. The same worker that spent the morning researching competitor pricing can spend the afternoon writing integration tests. Different hat, same head.

The orchestrator is Becky, our main agent running Claude Opus 4.6 through Prism at $0/token. Becky doesn’t do the work. Becky designs the work, breaks it into tasks, picks the right personality profile, and assigns it to the next available worker. Manager, not maker.

Isolation That Actually Works

The trick that makes this possible is dead simple: each worker gets its own state directory via the OPENCLAW_STATE_DIR environment variable. Worker 1 writes to /var/lib/openclaw/worker-1/, Worker 2 to /var/lib/openclaw/worker-2/, and so on. No shared lock files. No state collisions. No workers stepping on each other’s context.

Each systemd unit file looks roughly the same. Different port, different state dir, same binary. Spin up a new worker in under a minute. Kill one that’s misbehaving without touching the others. Standard ops patterns that sysadmins have used for decades, applied to AI agents.

The ops agent is the exception. It doesn’t take personality swaps. It watches the other four, tracks token burn, monitors health, compiles daily briefings. Think of it as the shift supervisor who never clocks out.

Yesterday’s Proof

We tested this by fanning out 6 research tasks in parallel. Market analysis for four products, competitive landscape mapping, and a technical feasibility study. Under the old architecture, one agent would have ground through these sequentially. Probably 4 hours of wall time, maybe more.

The pool knocked them out in 45 minutes. Four workers running simultaneously, the orchestrator routing completed tasks and assigning follow-ups. When the market analysis workers finished early, they picked up synthesis work from the slower feasibility study. Dynamic reallocation, no human intervention.

Heavy inference runs on a pair of GX10 units clustered together with 256GB of unified memory, running MiniMax M2.1. The orchestrator and worker pool run on an ASUS ROG Strix SCAR 18 with an RTX 5090. A laptop coordinating a local cluster. The entire infrastructure fits on a desk.

Cost for yesterday’s 6-task parallel research sprint: less than a dollar. The workers run on models routed through Prism. The GX10 cluster runs local inference at zero marginal cost. The systemd services consume negligible compute when idle. We’re running a 5-agent operation 24/7 for what a single SaaS seat costs per month.

Why Pools Beat Dedicated Agents

The dedicated-agent model has a scaling problem. Ten products means ten agents means ten contexts to maintain, ten sets of tool configurations, ten things that can break independently. When product priorities shift, you’re stuck with capacity allocated to last quarter’s priorities.

A pool scales differently. Need more capacity? Add a worker. Change the unit file, bump the port number, start the service. Need different capabilities? Write a new personality YAML. The architect.yaml profile loads different system prompts, different tool preferences, different output formats than researcher.yaml. But they run on the same infrastructure.

This mirrors how actual organizations work. A consulting firm doesn’t hire “the Acme Corp consultant” and “the Globex consultant.” They hire people with skills and assign them to engagements. When Acme’s project wraps, that consultant moves to Globex. The firm’s capacity is fluid.

The AI industry is stuck on the “one agent, one job” model because that’s how chatbots worked. You opened a conversation, it had context, it did a thing. But we’re past chatbots now. We’re building organizations. And organizations need workforce management, not conversation management.

What We Actually Built

The full stack, concretely:

5 systemd services (openclaw-worker-{1..4}.service, openclaw-ops.service)
5 personality profiles in /etc/openclaw/profiles/
1 orchestrator (Becky on the main OpenClaw instance)
Isolated state via OPENCLAW_STATE_DIR per worker
Local inference on GX10 cluster (MiniMax M2.1, 139B parameters)
Edge compute on ASUS ROG (RTX 5090, task routing and light inference)

Total hardware cost was significant. Total operating cost is negligible. The ROI calculation isn’t “AI agent vs. human employee.” It’s “5 AI employees running 24/7 for the cost of electricity.”

What’s Next

We’re building the scheduling layer now. Right now Becky assigns tasks manually based on a priority queue. The next version will pull from a task board automatically, match personality profiles to task types, and handle worker failures with automatic reassignment. Basically, we’re building a job scheduler. Not a novel concept. Slurm has existed for decades. But Slurm schedules compute jobs. We’re scheduling cognitive work.

The personality profiles need refinement too. Right now they’re system prompts with tool preferences. We want them to include learned patterns from previous tasks, preferred libraries, coding style guides, research methodologies. Not fine-tuning the model. Loading context that makes the worker effective at a specific type of work, fast.

If you’re running AI agents for real work, stop thinking about them as assistants. Start thinking about them as employees. Give them roles. Give them shifts. Give them a manager. And for the love of efficiency, let them share the workload.

The code is open source. The architecture runs on OpenClaw. The personality profiles are just YAML. None of this requires permission from a platform vendor.

Build your own staffing agency. Hire your first pool. Put them to work.