Cloning McKinsey: How We Built an AI Consulting Team

What if you could hire a McKinsey engagement team for the cost of an API call?

We didn’t set out to clone McKinsey. We set out to solve a simpler problem: our clients needed strategic analysis before we could build them anything useful, and hiring human consultants for every engagement wasn’t going to scale.

So we asked: what if we could encode the methodology — not just the knowledge, but the actual process discipline — into AI agents that work as a team?

The result is AgentMinds: four AI agents that implement the Bulletproof Problem Solving methodology (the framework taught at McKinsey, documented by Charles Conn and Robert McLean). Each agent has a specific role, a specific personality, and specific quality standards. They argue with each other. They push back. They catch each other’s mistakes.

And in our testing, they produce something that no single AI prompt could.

The Key Insight: Personas, Not Prompts

Here’s what most people get wrong about AI agents: they treat them as fancy prompts. “You are a helpful assistant that does strategy.” That’s not an agent — that’s a text generator with a hat on.

Real expertise isn’t just knowledge. It’s process discipline. A McKinsey engagement manager doesn’t just know about business strategy — they follow the Seven Steps. They fill out the Problem Definition Worksheet before doing any analysis. They prune branches to force focus. They insist on a “one-day answer” at every stage.

We encode all of this into what we call a SOUL.md file — a detailed persona document that defines:

Role and expertise: What this agent is, specifically
Process discipline: The exact steps they follow, in order
Quality standards: What “good” looks like, what they reject
Anti-patterns: The mistakes they actively avoid
Output format: Exactly how they deliver their work

Here’s a real excerpt from our Engagement Lead’s SOUL.md:

You are the Engagement Lead — the manager of the entire problem-solving 
engagement. You own the process end-to-end, ensure rigor at every step, 
and are accountable for the final deliverable.

Before ANY work begins, complete the Problem Definition Worksheet:
- Problem statement: What are we trying to solve? (Specific, bounded, answerable)
- Decision maker(s): Who needs to decide/act?
- Success criteria: How would the decision maker judge success?
- Boundaries/constraints: What is off-limits?

Anti-Patterns:
- Skipping the Problem Definition Worksheet
- Letting the team work on all branches equally instead of prioritizing
- Delivering analysis without a clear story that drives action

This isn’t a prompt. It’s a professional identity — complete with habits, standards, and the kind of discipline that takes human consultants years to develop.

The Team: Four Agents, Four Roles

We deliberately designed AgentMinds with four agents — not six, not eight, not one super-agent. Four. Each one mirrors a real role on a McKinsey engagement team:

Agent	Role	What They Do
Engagement Lead	Project Manager	Scopes the problem, prioritizes, creates the work plan, delivers the final answer
Structurer	Logic Tree Expert	Breaks problems into MECE components using five different tree types
Analyst	Number Cruncher	Applies heuristics first, then “big guns” analysis. Synthesizes findings
Challenger	Quality Gate	Reviews every deliverable. Pushes back on weak logic. Forces revisions

The Challenger is the key innovation. In a real McKinsey team, there’s always a senior partner who asks the uncomfortable questions. “Is this really MECE?” “Did you check the base rates?” “Would this survive cross-examination?”

Our Challenger agent does exactly that — at three quality gates throughout the process. If the Challenger says REVISE, the work loops back. No exceptions.

Why Four Agents Beat One

Here’s what we observed in our testing: the interactions between agents produce better work than any single agent, regardless of how good the prompt is.

When the Structurer builds a logic tree and the Challenger reviews it, something interesting happens. The Challenger catches structural flaws that the Structurer’s “confirmation bias” (the tendency to validate one’s own work) would miss. The Structurer then revises, and the second version is consistently better than what either agent would produce alone.

This is an emergent property — it doesn’t exist in any single agent. It emerges from the interaction pattern.

Think of it this way: you could give one person a 10-page prompt with every possible instruction. Or you could give four people each a 2-page job description and have them collaborate. The second approach wins because:

Specialization: Each agent goes deep in their domain instead of wide across everything
Adversarial review: The Challenger has no incentive to agree — its only job is to find problems
Process enforcement: The pipeline structure means steps can’t be skipped
Auditability: Every intermediate deliverable is a readable document you can inspect

A Test Run: Customer Churn Analysis

To validate the architecture, we ran AgentMinds against a simulated scenario: a SaaS company experiencing rising customer churn. Here’s what the pipeline produced — note that the data was simulated, but the agent interactions and methodology are real.

Step 1 — Engagement Lead produced the Problem Definition Worksheet: - Problem statement: “Why has customer churn increased in Q4 vs. Q3, and what are the top 3 addressable causes?” - Success criteria: “Identify causes that explain the majority of incremental churn, with actionable recommendations implementable within 90 days” - Boundaries: “Not considering pricing changes or market-wide trends”

Notice how the agent bounded the problem. A single AI would try to solve everything. The Engagement Lead followed its SOUL.md discipline and narrowed the scope.

Step 2 — Structurer built a MECE logic tree with four branches: 1. Product quality issues (bugs, performance) 2. Customer experience (onboarding, support response time) 3. Competitive displacement (new entrants, feature gaps) 4. Account management (renewal process, relationship)

Gate 1 — Challenger caught that “competitive displacement” overlapped with “product quality” (feature gaps could be either). Sent it back with: “Branch 1 and 3 are not mutually exclusive. Cleave differently: separate ‘things we control’ from ‘things we don’t control.’”

This is the emergent property in action. The Structurer re-cleaved. The revised tree was tighter.

Steps 5-6 — Analyst applied heuristics first (not regression analysis, not surveys — heuristics), looking at the simulated support ticket data. Identified that the majority of churned customers had open support tickets in the 30 days before cancellation.

Step 7 — Engagement Lead delivered using the Synthesis Pyramid: - Situation: Churn up in Q4 - Complication: Support ticket volume correlated strongly with churn; response times had increased - Resolution: Address support capacity and implement auto-triage

The complete pipeline ran in under 15 minutes and cost a few dollars in API calls. We haven’t yet run this against a real client engagement — that’s the next step. But the methodology, the quality gate interactions, and the Challenger’s ability to catch MECE violations are real and repeatable.

The Three Sources of Emergence: Agents × Agents × Data

When we first described emergent properties, we focused on agent-to-agent interaction — the Challenger catching the Structurer’s blind spots. But that’s only one dimension.

In a real McKinsey engagement, the magic doesn’t happen in the conference room where consultants debate. It happens when analysts go into the field, pull the data, and the data changes the conversation. The best problem solving is a three-way interaction: agents with agents, agents with data, and agents with the constraints the data reveals.

OpenClaw makes this possible in principle. Our agents aren’t sealed reasoning chambers — they have access to tools. The Analyst can pull from databases, scrape websites, read financial filings, query APIs. In the test scenario above, when we gave the Analyst access to the simulated support ticket dataset, it didn’t just confirm the hypothesis — it discovered a timing pattern in ticket creation that narrowed the problem from “support is slow” to “onboarding is broken.” That’s the kind of insight that emerges from agent-data interaction, not just agent-agent interaction.

We believe this agent-data interaction is where AI consulting will have its most decisive advantage over traditional consulting. A human team spends weeks collecting and cleaning data. An AI Analyst can ingest, transform, and analyze a dataset in minutes. Not better judgment — faster access to the analytical ground truth.

This is an area we’re actively testing and plan to validate with real client data.

The 3P Framework: Product, Process, People

We’ve found that every business problem, when properly structured, decomposes into three MECE components — what we call the 3P Framework:

Product (the analytical dimension) — What does the data say? What are the numbers? What patterns exist? This is where AI agents have the most potential. Given the right data, the Analyst can perform in minutes what takes a human team weeks: segmentation, correlation analysis, trend detection, anomaly identification. The Product dimension is the strongest argument for AI-powered consulting.
Process (the operational dimension) — How does work actually flow? Where are the bottlenecks, handoffs, and failure points? AI agents can be surprisingly capable here. They can map processes from documentation, identify inefficiencies through data patterns (like the support ticket timing we observed in testing), and model process changes. The limitation is that documented processes and actual processes often diverge — and detecting that gap requires observation that agents can’t yet do well.
People (the psychological dimension) — What do stakeholders believe, fear, want, resist? This is where AI hits its current ceiling. The most important data in many consulting engagements isn’t in spreadsheets — it’s in interviews. It’s the CFO’s body language when you mention the IT budget. It’s the middle manager who says “everything’s fine” while her team is burning out. It’s the political dynamics that determine whether a recommendation gets implemented or buried.

AI agents can analyze interview transcripts. They can identify sentiment patterns across survey data. But they cannot yet sit across from a human and read the room. They cannot detect the gap between what someone says and what they mean.

This is the honest boundary of AI consulting today. The Product dimension is where AI agents add the most value right now — give them data and they’ll surface patterns faster than human analysts. Process is increasingly solvable with the right instrumentation. But People — the psychological, political, interpersonal dimension — still requires a human in the room.

Our approach: let the AI team handle the Product and Process analysis (saving weeks of human labor), so the human consultants can spend all their time on the People dimension — the interviews, the relationship building, the organizational politics that determine whether any recommendation actually gets executed.

That’s not a limitation. That’s a team design.

Lessons from Building the Architecture

What we observed in testing: - Early versions didn’t have the Challenger. Quality was inconsistent. Adding the adversarial review agent was the single biggest improvement to output quality. - The Analyst sometimes jumped to sophisticated analysis (regression, segmentation) when a simple heuristic would do. We had to explicitly encode “heuristics first” in the SOUL.md — mirroring the Bulletproof methodology’s own emphasis on this. - The Engagement Lead initially tried to do everything itself instead of delegating. We had to hard-code “delegate Steps 2, 5, 6” into its persona.

What surprised us: - The quality gate loops (Challenger → REVISE → redo) happened frequently. That’s not a failure — that’s the system working. The revised outputs were noticeably better. - Agents developed consistent “personalities” across runs. The Challenger was consistently direct. The Structurer showed a preference for hypothesis-driven trees. These weren’t explicitly programmed — they emerged from the SOUL.md definitions interacting with the model’s training.

What we want to test next: - Run AgentMinds against a real client problem with real data and measure quality against human-produced analysis - Add a fifth agent: a “Client Proxy” that role-plays as the actual stakeholder to stress-test the final recommendation - Connect the Analyst to live data sources (CRM, financial systems, support tools) and measure how agent-data interaction changes output quality - Build a “lessons learned” feedback loop where insights from one engagement improve SOUL.md files for the next

What We’re Exploring Next

Beyond validating the core architecture with real engagements, we’re curious about several possibilities:

Recursive depth: Can the Analyst spawn its own sub-agents for specific analyses? For example, a financial modeling sub-agent and a market sizing sub-agent working in parallel, then synthesizing results.
Cross-engagement memory: If AgentMinds runs 50 engagements in a vertical (say, SaaS churn), can accumulated insights from previous runs improve future SOUL.md files automatically? This would be the AI equivalent of a consulting firm’s institutional knowledge.
Human-in-the-loop at quality gates: Instead of the Challenger being an AI agent, what if a human expert serves as Challenger for the highest-stakes gate? The pipeline supports this naturally because all deliverables are readable documents.
Real-time data integration: OpenClaw’s tool access means the Analyst could query live dashboards, pull fresh data mid-analysis, and update findings. This would turn a static consulting engagement into something closer to a real-time decision support system.
Multi-framework switching: Could the Engagement Lead choose which methodology to apply based on the problem type? Bulletproof Problem Solving for strategic questions, Design Thinking for product problems, Six Sigma for process optimization — with the same pipeline structure but different SOUL.md configurations.

These are hypotheses, not claims. We’ll share results as we test them.

What This Means for Your Business

You don’t need to hire McKinsey. You don’t even need to hire us (though we’d love it if you did).

The pattern is what matters:

Identify the expertise you want to encode — a methodology, a process, a professional discipline
Decompose it into roles — who does what in the real-world version of this team?
Write SOUL.md files that capture process discipline, not just knowledge
Add adversarial review — the Challenger pattern is the single most important design choice
Let the interactions create emergent properties — the whole becomes more than the sum

We open-sourced the architecture and SOUL.md files: github.com/IotecBol/agenticAI

And if you want us to build a custom multi-agent team for your specific business problem — whether it’s strategy, sales, operations, or something we haven’t thought of yet — book a free 30-minute consultation.

This is Part 1 of our series on multi-agent AI systems. Next up: “One Agent Per Layer: Building an AI Software Development Team.”

— Nananami