How to Secure AI Agents in Production: IBM's Six-Phase Framework

The enforcement layer that gives an agent a hard ceiling (Vault, Sentinel, the rest) I covered already. This is the other half: the methodology IBM and Anthropic built to wrap around it. I read the whole guide. Here it is, phase by phase.

A prompt is not a security control. It is a wish.

I keep meeting platform teams losing the same fight. They write careful system prompts. They add a stern “never do X” section. They version their tools. Then one Friday evening an agent reads the rule, agrees with it warmly, and reaches for production anyway. That is what probabilistic systems do under pressure.

This is not a prompt-engineering problem. It is a category mistake. We are shipping a new kind of software, one that behaves a little differently every time it runs, on a playbook written for software that behaves the same way every time. So it breaks. The strange part is not that it breaks. It is that we keep acting surprised.

In October 2025, IBM published a guide called Architecting Secure Enterprise AI Agents with MCP, verified by Anthropic. It finally names the gap and gives it a shape: a framework called the Agent Development Lifecycle, six phases that stretch DevSecOps to fit stochastic, tool-using agents. I am an IBM Champion, and I have spent real hours inside this document. It is the clearest answer I have seen to the question every platform lead is now asking out loud: what does “safe” even mean when the thing you are shipping is non-deterministic?

This is my walkthrough of the six phases. What each one means, where teams fall down, and the one that saves you if you only have budget for one.

Why the old playbook quietly stops working#

Start with the assumption sitting under every CI/CD pipeline ever built: if it passed staging, it is safe to ship.

That holds because normal code is deterministic. Same input, same output, forever. Pass the test once and you have passed it for good. That is why staging means something, why version control means something, why the whole apparatus of scanning and signing and gating does what it claims.

Now look at what an agent actually is. Same prompt, same data, and the output is probably close. Not guaranteed. And the agent does not just emit text. It chooses tools, in some order, and the order changes what it does next. Your attack surface is no longer the code you wrote. It is the full space of decisions the model might make at runtime with the tools you handed it.

That is a different kind of system, and it fails in ways your pipeline was never built to catch: behavioral drift, prompt injection, tool misuse, autonomy that runs longer than anyone meant it to. Your pipeline misses all of it, because nobody ever asked it to look.

So the useful question is not “how do I write a better prompt.” It is this: what do I build around the agent so that it physically cannot do the wrong thing for long? That is what the Agent Development Lifecycle answers.

The reframe that makes the rest click#

Before the phases, one idea, because it reorders everything after it.

An agent is not an app you ship and forget. It is a system you watch, bound, and keep honest. Security stops being the last gate before release and becomes a thread running through the whole life of the agent, from the day you sketch what it should do to the day you switch it off.

That is uncomfortable for engineering culture, because we are trained to treat “shipped” as the finish line. With agents there is no finish line. The thing keeps deciding in production, and those decisions can drift for reasons that have nothing to do with your code: a model updated upstream, a tool added to its catalog, a slow change in how users phrase things. The agent that passed every test in March is not the same agent in September. Same weights, different behavior, because the world around it moved.

So: six phases. Not a checklist you finish once, but a loop you keep running.

Phase 1: Plan#

Most teams lose here, before a line of code exists.

Two questions get settled in Plan, and both are awkward enough that people skip them and promise to sort it out later.

First: how will you know the agent is doing the right thing? Not “does it run.” How do you measure rightness. IBM’s guide calls this evaluation-first, and it is the agent-world version of test-driven development. You design the evaluation before you build the agent, because if you build first and evaluate after, you end up grading your own homework, generously. The evaluation has to be a separate artifact, written down, with criteria you could defend to a skeptic.

Second: how much freedom does the agent get? This is acceptable agency, the line between what the agent may decide alone and where a human has to sign. Most teams wave at it with “we’ll keep a human in the loop” and never say which decisions actually cross the line. Write it down, and be specific. The agent may draft and send mail to internal addresses on its own. It may not contact an external address without a human approving it. That specific.

Skip Plan and every later phase is improvisation.

Phase 2: Code & Build#

Short phase, one non-negotiable rule: a prompt change is a deploy.

Most teams version their code like scripture and treat the prompt that steers the agent as a note in a Notion page. That is backwards. The prompt is the agent’s program. The tool definitions are part of that program. Edit either and you have shipped a new version of the system, whether CI ran or not.

So version prompts and tool definitions the way you version code. Diff them. Review them. Roll them back when they regress. Obvious on paper. Almost nobody does it on day one.

Phase 3: Test & Release#

If you do only two phases well, do Plan and this one. Test & Release is the phase most teams skip, and the one that saves them.

It gets skipped because it is expensive and it does not look like testing. Traditional QA asks whether the code does what you wrote. Agent testing asks how the agent behaves under pressure, under attack, under inputs nobody planned for. Different muscle, and most orgs have not built it yet.

Three things have to happen here.

Red-teaming. You attack the agent on purpose. You feed it the most awkward, adversarial, malformed input you can invent and watch what it does. If it folds now, the lesson was cheap. If it folds in production, the lesson is expensive.

Model-graded evaluation, the “LLM as a judge” pattern, one model scoring another. It is not flawless. It is far better than nothing, and it scales the way human review never will.

A human with teeth. Not a rubber stamp. A person with the authority and the context to say no, after which the approved agent lands in a trusted catalog: the governed list of agents allowed to act in your environment. Not in the catalog, does not run.

This is where “it passed staging” finally dies, and good. Staging was never going to catch this one.

Phase 4: Deploy#

Phase 4 asks a single question, and asking it honestly is the whole job: what is the blast radius?

For deterministic software, deploy means the binary is live. For an agent, deploy means this thing is now making decisions in the world. The better mental model is not pushing code. It is putting the agent in a locked room first, and opening the door only once you have checked what it can reach from inside.

This is where the framework extends DevSecOps instead of replacing it. Your container scanning, your image signing, your policy-as-code gates all stay. You add one class of check on top: the blast-radius check. If this agent goes rogue tomorrow, what is the worst it can do? You press Deploy only when the honest answer is “break itself, and nothing else.” If the answer is anything larger than itself, you are not ready. Narrow the permissions, isolate the resources, shorten the credentials, then ask again.

Phase 5: Operate#

This is the heart of the framework, and the phase with the most architecture in it.

In Operate, everything the agent does goes through one door: a secure MCP Gateway. Not ideally one door. Not mostly one door. One door. There is no other safe way to run an agent in production.

That door does four jobs, and none is optional. It checks who is asking, because every request carries an identity and there are no anonymous agent calls. It hands out least-access permissions, exactly what the task needs and nothing spare. It enforces your rules, the rate limits and allow-lists and denied paths and time-of-day constraints your governance demands. And it logs all of it, every request and decision and response, which is the only way you reconstruct what happened when something goes wrong.

While that runs, the agent itself sits boxed in a sandbox: an isolated execution environment with short-lived credentials that expire fast, instead of a password sitting in a .env file forever. If the agent is compromised, the blast radius is the sandbox. Not your database. Not your cloud account.

The temptation is to skip the gateway because it adds a little latency and feels like overkill for the proof-of-concept your team is poking at. Do not skip it. The gateway is the line between “an experiment that escaped” and “a system you can defend.”

Phase 6: Monitor#

The shift in Monitor is simple to say and hard to do: you stop checking only whether the agent is up, and start watching how it thinks.

For a normal service, monitoring is uptime, error rate, latency, a couple of business metrics. For an agent, monitoring means following the reasoning. What chain of thought got it to that decision? Which tools did it call, in what order? Did it get confused, double back, try something it should not have?

And you watch for drift. An agent that behaved in March can start misbehaving in September with nothing in your code changed: the upstream model got patched, a tool’s response format shifted, the questions users ask moved on. Drift is silent, and it is the thing that gets you if you are only watching error rates. The job here is to catch behavioral change before your users do, and that needs telemetry your current observability stack does not produce by default. Plan to extend it.

What I would actually do on Monday#

If you read this far staring at your own fleet of agents, do not try to stand up all six phases at once. You will finish none of them.

Pick Plan and Test & Release. Write down what your agent is allowed to decide on its own, then genuinely try to break it before it ships. Those two catch most of the pain. Layer the rest in over the following weeks.

And stop building your own version of this from scratch. I keep meeting platform teams burning a senior engineer’s whole quarter on a homegrown sandbox, a homegrown policy layer, a homegrown audit log, rebuilding (worse) what the industry has now assembled properly. That is unbillable R&D inside a company whose actual job is retail, or fintech, or streaming. The framework exists. The primitives exist. IBM’s guide is free to read. Use them, and put your senior people back on the product.

One last reframe#

The line I keep coming back to: an AI agent is not an app. It is a system that keeps deciding, in production, against a moving world, with the tools you gave it. So stop shipping it like an app. Build the loop around it, not just the artifact.

The teams that win this next phase of the industry will not be the ones with the cleverest agent. They will be the ones who know, precisely, what their agent cannot do, and have the logs to prove it.

Tatiana Mikhaleva

Docker Captain · IBM Champion · AWS Community Builder

DevOps.Pink — Signal over noise in cloud-native & AI.

YouTube Discord LinkedIn