When AI Meets the FDA: Building a Regulatory Layer for AI Development
The hardest software you'll ever ship is the software where getting it wrong has real consequences.
I've spent time in IVD - in vitro diagnostics, the machines that screen blood supplies and run the PCR tests that became household names during COVID. The kind of software where a misclassified sample means contaminated blood reaches a patient.
That experience rewired how I think about software development. Not because the code is harder - it's not - but because the consequences are real, the regulators are watching, and "move fast and break things" will get your product pulled from the market and your company sued into oblivion.
So when we built our third multi-agent architecture, we didn't just ask "how do you build software with AI agents?" We asked: "how do you build software that the FDA will approve?"
The answer wasn't to start from scratch. It was to build a regulatory layer on top of what we already had.
Why Regulated Software Is a Different Animal
If you've only built commercial software, here's the gap: in regulated development, the process is the product. The FDA doesn't just care whether your software works. They care whether you can prove it works, prove you thought about the risks, and prove that every requirement traces from user need to verified test result.
This isn't bureaucracy for its own sake. Medical device failures can have consequences ranging from misdiagnosis to patient harm, and in the worst cases, fatal outcomes. When something goes wrong, the first question is "can we trace back through the process to find the root cause?" Because the goal isn't blame. It's making sure it never happens again. That's why documentation and traceability matter so much. Without them, you can't find the root cause, and you can't fix it.
The key regulations:
- IEC 62304 - Software lifecycle processes (how you build it)
- ISO 14971 - Risk management (what could go wrong)
- 21 CFR Part 820 - Quality System Regulation (your whole quality management system)
- IEC 62366-1 - Usability engineering (can humans actually use it safely)
Miss any of these and your 510(k) submission goes in the trash.
The Key Insight: Don't Rebuild, Layer
Our first instinct was to design a completely separate pipeline for regulated software. That's how most companies think about it too: "regulated development is different, so we need different tools."
But that's wrong. The development isn't different. The oversight is different.
A developer writing code for a medical device uses the same languages, the same patterns, the same testing approaches as a developer building a SaaS app. What changes is that every decision needs regulatory traceability, every risk needs documentation, and every test needs to map back to a requirement.
So instead of building a 10-agent monolith, we built AgentMedReg as a regulatory layer that wraps around our existing AgentForge development pipeline.
The Architecture: A Regulatory Layer + AgentForge
AgentMedReg adds four specialized regulatory agents that sit around AgentForge's development team:
┌──────────────────────────┐
│ Regulatory Strategist │ ← Runs FIRST and LAST
│ (classification, pathway, │ (the "bookend")
│ submission readiness) │
└────────────┬─────────────┘
│
┌──────────────┼──────────────┐
│ │ │
▼ ▼ ▼
┌──────────────┐ ┌───────────┐ ┌──────────────┐
│ Risk Manager │ │ Human │ │Design Controls│
│ (ISO 14971) │ │ Factors │ │ Lead │
│ │ │(IEC 62366)│ │ (RTM) │
└──────┬───────┘ └─────┬─────┘ └──────┬───────┘
│ │ │
└───────────────┼──────────────┘
│
▼
┌─────────────────────────┐
│ AgentForge │
│ (existing dev pipeline) │
│ │
│ Orchestrator → Strategist│
│ → Analyst → Architect │
│ → Developer → QA │
│ → DevOps → Monitor │
└─────────────────────────┘
The regulatory layer sets constraints before development starts and validates compliance after development finishes. AgentForge does what it already does: build software. It just does it inside a regulatory box.
The Four Regulatory Agents
Regulatory Strategist - Runs first and last. Classifies the device, determines the submission pathway (510(k), De Novo, PMA), maps applicable standards, and at the end, does a submission readiness assessment. This agent is the "bookend" - it frames everything and validates the result.
Risk Manager - Implements ISO 14971 end-to-end. Hazard identification, FMEA, risk estimation, risk controls, residual risk evaluation. This agent doesn't just list risks - it traces every risk control to a design requirement and every residual risk to an acceptance rationale.
Human Factors Engineer - IEC 62366-1 usability engineering. Use specifications, task analysis, critical task identification, formative and summative usability evaluations. In medical devices, the user interface IS a safety feature. If a clinician can misread a result because of bad UI, that's a design defect, not user error.
Design Controls Lead - Owns the requirements traceability matrix (RTM). Every user need traces to a design input, every input to an output, every output to a verification test, every test to a validation result. Gaps in the RTM are gaps in your submission.
How the Layers Talk
The regulatory layer doesn't micromanage AgentForge. It communicates through documents, the same way AgentForge's internal agents do.
Before development starts, the regulatory agents produce:
- A regulatory brief (classification, applicable standards, submission pathway)
- A risk analysis (hazards, risk controls that must be implemented)
- A usability specification (critical tasks, safety-related UI requirements)
- A traceability template (the RTM structure that development must populate)
These documents feed into AgentForge as constraints. The Strategist and Analyst inside AgentForge incorporate them into requirements. The Developer builds to those requirements. QA tests against them.
After development finishes, the regulatory layer runs again:
- The Design Controls Lead verifies the RTM is complete - every requirement traced end-to-end
- The Risk Manager verifies all risk controls were implemented and residual risks are acceptable
- The Regulatory Strategist does a submission readiness check - are we ready for the FDA?
If anything fails, it loops back. Not to rewrite the regulatory strategy, but to send specific issues back into AgentForge for resolution.
The "Bookend" Pattern
The biggest architectural insight from building AgentMedReg is what we call the Bookend Pattern: start and end your pipeline with a Constraints Agent.
The Regulatory Strategist runs first to set classification, applicable standards, and submission strategy. Then the entire pipeline runs. Then the Regulatory Strategist runs again to verify submission readiness - checking that what was built actually satisfies the regulatory framework that was defined at the start.
This pattern is so powerful that we're retrofitting it to our other architectures:
- AgentForge gets a Constraints Agent that sets technical boundaries before development starts, and validates compliance at the end
- AgentMinds gets a Constraints Scoper that frames the problem space before analysis, preventing scope creep
The insight: every pipeline should start and end with a constraints agent. Define the box, then build in the box, then verify you're still in the box.
Why Composability Beats Monoliths
The old approach would have been to build a single, massive regulated development pipeline from scratch. That's how most compliance-heavy organizations think: specialized tools for specialized work.
The composable approach is better for three reasons:
- You don't duplicate effort. AgentForge already knows how to build software. Why rebuild that capability with regulatory-specific agents that are worse at coding?
- Improvements propagate. When we make AgentForge's QA agent smarter, AgentMedReg gets that improvement for free. When we add a Code Reviewer agent to AgentForge, every pipeline that uses it benefits.
- Domain layers are reusable. The regulatory layer we built for FDA could be adapted for other regulated domains: fintech (SOX, PCI-DSS), automotive (ISO 26262), aerospace (DO-178C). Different regulations, same pattern: domain experts wrap a development team.
This is how real organizations work. You don't hire a completely separate engineering team for every regulated product. You have a dev team, and you add compliance specialists who guide that team's work.
What Building This Taught Us
Creating a regulated layer forced a level of rigor that exposed weaknesses in our other architectures:
For AgentForge:
- QA as a single agent was too broad. Verification (did we build it right?) and Validation (did we build the right thing?) are distinct disciplines. We're splitting them.
- No post-deployment agent was a gap. Software doesn't stop needing quality after release.
- No constraint-setting at pipeline start meant developers could build things that violated architectural principles.
For AgentMinds:
- The Analyst was one-directional (analyze, then report). In practice, analysis should be bidirectional - the analyst should be able to request more data.
- No client proxy agent meant the team could drift from what the client actually needed.
- No constraints scoper meant problem definitions could inflate during analysis.
Universal lesson: Building the most regulated pipeline first would have made all our architectures better from the start. The FDA doesn't add unnecessary steps. Every requirement exists because something went wrong when it was missing.
The Composable Vision
Here's where this is heading. Three architectures, each composable:
- AgentMinds (4 agents) handles strategy and analysis. When it identifies something that needs to be built, it dispatches to AgentForge.
- AgentForge (8 agents) is the core development pipeline. It builds software, tested and deployed.
- AgentMedReg (4 agents) wraps AgentForge with regulatory oversight for FDA-regulated work. The same pattern works for any regulated domain.
They're not three separate products. They're layers that compose. A medical device company might use all three: AgentMinds to analyze the market opportunity, AgentMedReg + AgentForge to build the regulated software. A startup might just use AgentForge. A consulting firm might only need AgentMinds.
That's the real power of multi-agent architecture. Not building bigger pipelines, but building composable teams that snap together based on what the problem requires.
The Bottom Line
Three architectures, three lessons:
- AgentMinds taught us that AI agents need process discipline, not just knowledge
- AgentForge taught us that pipeline architecture matters - who talks to whom, in what order
- AgentMedReg taught us that domain expertise should layer on top of development, not replace it
If you're building AI agents for a regulated domain, don't start over. Build a regulatory layer around a development pipeline that already works. The constraints make the architecture better, and the composability makes it reusable.
And if you're in medical devices wondering whether AI can work in your regulatory environment - yes. But only if the AI understands the regulations as deeply as it understands the code.
This is part 3 of a series on multi-agent AI architecture. Part 1: How We Built an AI Consulting Team covers strategic analysis. Part 2: One Agent Per Layer covers software development. All three architectures are developed by Nananami.