When AI Meets the FDA: Building a 10-Agent Regulatory Pipeline

The hardest software you’ll ever ship is the software that could kill someone if you get it wrong.


I’ve spent time in IVD — in vitro diagnostics, the machines that screen blood supplies and run the PCR tests that became household names during COVID. The kind of software where a misclassified sample means contaminated blood reaches a patient.

That experience rewired how I think about software development. Not because the code is harder — it’s not — but because the consequences are real, the regulators are watching, and “move fast and break things” will get your product pulled from the market and your company sued into oblivion.

So when we built our third multi-agent architecture, we didn’t just ask “how do you build software with AI agents?” We asked: “how do you build software that the FDA will let you sell?”

The result is AgentMedReg — a 10-agent pipeline purpose-built for FDA-regulated medical device software.

Why Regulated Software Is a Different Animal

If you’ve only built commercial software, here’s the gap: in regulated development, the process is the product. The FDA doesn’t just care whether your software works. They care whether you can prove it works, prove you thought about the risks, and prove that every requirement traces from user need to verified test result.

This isn’t bureaucracy for its own sake. It’s because medical devices fail in the real world, and when they do, the question isn’t “what went wrong?” It’s “did you follow a process that should have caught this?”

The key regulations: - IEC 62304 — Software lifecycle processes (how you build it) - ISO 14971 — Risk management (what could go wrong) - 21 CFR Part 820 — Quality System Regulation (your whole quality management system) - IEC 62366-1 — Usability engineering (can humans actually use it safely)

Miss any of these and your 510(k) submission goes in the trash.

The Architecture: 10 Agents, Three Layers

We organized AgentMedReg into three functional layers:

Layer 1: Regulatory (4 agents)

Regulatory Strategist — Runs first and last. Classifies the device, determines the submission pathway (510(k), De Novo, PMA), maps applicable standards, and at the end, does a submission readiness assessment. This agent is the “bookend” — it frames everything and validates the result.

Risk Manager — Implements ISO 14971 end-to-end. Hazard identification, FMEA, risk estimation, risk controls, residual risk evaluation. This agent doesn’t just list risks — it traces every risk control to a design requirement and every residual risk to an acceptance rationale.

Human Factors Engineer — IEC 62366-1 usability engineering. Use specifications, task analysis, critical task identification, formative and summative usability evaluations. In medical devices, the user interface IS a safety feature. If a clinician can misread a result because of bad UI, that’s a design defect, not user error.

Design Controls Lead — Owns the requirements traceability matrix (RTM). Every user need traces to a design input, every input to an output, every output to a verification test, every test to a validation result. Gaps in the RTM are gaps in your submission.

Layer 2: Development (4 agents)

These are adapted from our AgentForge architecture, but with regulatory constraints baked in:

Strategist — Product strategy, but constrained by regulatory classification and intended use statements. You can’t pivot your way out of a Class III designation.

Analyst — Requirements analysis with regulatory traceability. Every requirement gets a risk reference and an acceptance criterion before it’s considered “done.”

Architect — System architecture with IEC 62304 software safety classification in mind. Class A, B, or C software items get different levels of documentation and testing rigor.

Developer — Implementation with mandatory unit-level documentation. In regulated software, code review isn’t optional — it’s auditable.

Layer 3: Assurance (2 agents)

V&V Agent — Replaces the generic QA agent from AgentForge. Verification (did we build it right?) and Validation (did we build the right thing?) are distinct activities in regulated development. This agent runs protocols, documents results, and traces everything back to the RTM.

CAPA & Surveillance — Post-market. Complaint handling, adverse event reporting (MDRs), trend analysis, corrective and preventive actions. The FDA cares about your product after you ship it, not just before. This agent closes the loop from field data back to design.

The “Bookend” Pattern

The biggest architectural insight from building AgentMedReg is what we call the Bookend Pattern: start and end your pipeline with a Constraints Agent.

The Regulatory Strategist runs first to set classification, applicable standards, and submission strategy. Then the entire pipeline runs. Then the Regulatory Strategist runs again to verify submission readiness — checking that what was built actually satisfies the regulatory framework that was defined at the start.

This pattern is so powerful that we’re retrofitting it to our other architectures:

The insight: every pipeline should start and end with a constraints agent. Define the box, then build in the box, then verify you’re still in the box.

What Building This Taught Us About All Our Architectures

Creating a regulated pipeline forced a level of rigor that exposed weaknesses in our other two architectures:

For AgentForge: - QA as a single agent was too broad. V&V is really two disciplines — split them. - No post-deployment agent was a gap. Software doesn’t stop needing quality after release. - No constraint-setting at pipeline start meant developers could build things that violated architectural principles.

For AgentMinds: - The Analyst was one-directional (analyze → report). In practice, analysis should be bidirectional — the analyst should be able to request more data. - No client proxy agent meant the team could drift from what the client actually needed. - No constraints scoper meant problem definitions could inflate during analysis.

Universal lesson: Building the most regulated pipeline first would have made all our architectures better from the start. The FDA doesn’t add unnecessary steps — every requirement exists because someone got hurt when it was missing.

Why This Matters for AI Consulting

We built AgentMedReg because medical device companies are an underserved market for AI tooling. Most AI consultancies won’t touch regulated industries because the compliance burden seems too high.

But here’s the thing: regulated companies already have the process discipline. They already write SOPs, maintain traceability matrices, conduct design reviews. They just do it manually, slowly, and expensively.

AI agents that understand the regulatory framework can accelerate every step — not by cutting corners, but by automating the documentation, traceability, and review processes that currently eat 40-60% of engineering time.

That’s not replacing engineers. That’s giving them back half their week.

The Bottom Line

Three architectures, three lessons:

  1. AgentMinds taught us that AI agents need process discipline, not just knowledge
  2. AgentForge taught us that pipeline architecture matters — who talks to whom, in what order
  3. AgentMedReg taught us that the hardest constraints produce the best architectures

If you’re building AI agents for any domain, start with the most regulated, most constrained version of the problem. The architecture you design under those constraints will be better than anything you’d design in the open.

And if you’re in medical devices wondering whether AI can work in your regulatory environment — yes. But only if the AI understands the regulations as deeply as it understands the code.


This is part 3 of a series on multi-agent AI architecture. Part 1: Cloning McKinsey covers strategic analysis. Part 2: One Agent Per Layer covers software development. All three architectures are developed by Nananami.