API Security When the Caller Is an AI Agent
APIs are already the dominant attack surface. Agentic AI changes the threat model because the caller is now an autonomous agent — so the question shifts from authenticating the user to verifying the intent, scope, and blast radius of a call.
Most security teams still picture an API call as a person clicking a button, or a service their own engineers wrote and deployed. That mental model is already wrong, and agentic AI is about to make it dangerously wrong. The thing calling your API at 3am isn’t a user and isn’t your code. It’s an autonomous agent, holding a credential, making its own decisions about what to do next.
I’ve spent a lot of the last year talking about this with security leaders, and the same realisation lands every time. APIs were already the place attackers go first — they’re the front door to the data, they’re machine-readable, and most organisations have far more of them than they can name. Agentic AI doesn’t create that problem. It pours accelerant on it.
The threat model just changed under you
Classic API security answers one question well: is this caller who they say they are? You authenticate the user, you check the token, you log the session. The whole edifice assumes a known identity with predictable behaviour on the other end.
An agent breaks every part of that assumption. It authenticates fine — it has a perfectly valid credential. But it’s not a known user with a habitual pattern. It’s a goal-seeking system that will call your API in whatever sequence, at whatever volume, in whatever combination gets it closer to its objective. It might call an endpoint a thousand times in a minute because that’s what the task needed. It might chain three APIs together in a way no human workflow ever would.
So the question you actually need to answer is no longer “who is this?” It’s “what is this caller trying to do, is it allowed to do that, and how much damage can it cause if it’s wrong or compromised?” Identity, intent, scope, blast radius. The first one you’ve probably solved. The other three are where the work is now.
Treat the agent as its own identity, not a borrowed one
The first failure I see is agents wearing a human’s credentials. Someone wires an assistant to act “as the user”, so it inherits everything the user can do. Now you’ve got an autonomous system with the full standing access of a senior employee, and your audit log says a person did it.
Agents need their own machine identities — issued, scoped, and revocable independently of any human. Zero-trust principles were written for exactly this: never trust by network position, verify every call, assume the credential will eventually leak. Treat an agent identity the way you’d treat a microservice that you don’t fully trust. Short-lived tokens. Narrow scopes. No standing access to anything it doesn’t need for the task in front of it.
The practical test: if this agent’s credential leaked tomorrow, what’s the worst a single stolen token can do? If the honest answer is “anything the user could”, you’ve given it too much.
Scope to the task, not the role
Human access tends to be role-shaped — you’re in finance, so you can see finance things. That’s already a blunt instrument, but for agents it’s actively dangerous, because an agent doesn’t need standing access. It needs just-in-time access to do one thing, then nothing.
The pattern that works is narrow, ephemeral, and task-bound. An agent processing refunds gets a credential that can read order history and issue refunds below a threshold — and that’s the entire universe it can touch. It can’t read payroll. It can’t delete records. It can’t reach the endpoints that weren’t part of the job. When the task ends, the scope ends.
Rate limits stop being a noisy-neighbour nicety here and become a real control. A human hitting an endpoint a few times a minute is normal. An agent doing it ten thousand times is either working hard or running away with itself, and you want a ceiling that catches the second case before it empties a table or runs up a bill. Per-agent quotas, per-endpoint limits, and circuit breakers on anything expensive or destructive.
Keep a human between the agent and anything irreversible
This is the control I argue hardest for, and it’s the one teams most want to skip because it feels like it defeats the point of automation. It doesn’t. It draws a line.
Sort actions by reversibility. Reading data, drafting a message, flagging a record for review — reversible, low blast radius, let the agent run. Issuing a payment, deleting customer data, changing a permission, sending something to a customer that can’t be unsent — irreversible, high blast radius, route it through a human checkpoint. The agent does the work right up to the gate, presents what it’s about to do, and waits for a yes.
The boundary has to be explicit and it has to live in the system, not in a policy document nobody reads. Below this threshold, act. Above it, ask. Start with the gate set tight, watch how the agent actually behaves, and widen the autonomous lane only as it earns trust through measured performance. Designing for full autonomy on day one isn’t bold. It’s how you find out the hard way that your agent will confidently do the wrong thing at scale.
You can’t secure what you can’t see
The last piece is observability, and it’s the one that quietly decides whether any of the above is real. When an agent makes a hundred API calls to complete a task, you need to be able to reconstruct exactly what it did, in what order, with what data, and why.
That means logging at the granularity of the agent’s decisions, not just the HTTP requests. Which agent. Which task. Which credential. What it asked for, what it got back, what it did next. Tie the calls together so a single agent run reads as one coherent trace rather than scattered log lines you have to reassemble during an incident at the worst possible time.
This matters for detection — anomalous agent behaviour looks different from anomalous human behaviour, and your monitoring needs baselines for both — and it matters even more for the aftermath. When something goes wrong, and with autonomous systems acting at machine speed it eventually will, the difference between a contained incident and a catastrophe is whether you can answer “what exactly did it touch?” in minutes instead of days.
Where to start
If you’re staring at an API estate and an incoming wave of agents, you don’t need to boil the ocean. Start with the irreversible actions — payments, deletions, anything that touches a customer — and put explicit gates and per-agent scopes around those first. They’re the smallest surface and the highest stakes. Then give every agent its own identity, instrument the calls so you can see what they’re doing, and tighten outward from there.
The mindset shift is the whole thing. Stop asking only whether the caller is authenticated. Start asking what it intends, what it’s scoped to, and what it can break — because the caller is no longer a person you can reason about, and it’s not going to wait for you to catch up.
I'm speaking on this — Applied AI in Practice: Deploying Tools Inside Real Client Workflows →