From the course: Model Context Protocol (MCP): Hands-On with Agentic AI

Limiting the blast radius of AI agents - Claude Tutorial

From the course: Model Context Protocol (MCP): Hands-On with Agentic AI

Limiting the blast radius of AI agents

- Before we go any further, let's talk about the blast radius of AI agents and the risks involved in running and building MCPs. When you give a language model capabilities to do something on your computer, it may do things you didn't intend and may even do things you would never do. And because of how AI agents operate behind the scenes in AI chat apps, it's easy to forget that things are happening on the computer and data is being manipulated in some way that we don't necessarily understand. You saw a direct example of that when I demonstrated how MCPs work inside Claude. Code is running. But even if you're looking at the code, you won't necessarily know what is actually happening on the server. That means, first, if you're intending to use an MCP, you have to trust the people who built it and trust that they have your interest in mind and are upholding their duty of care to you. Second, you need to know what the MCP might do on your computer. Is it just retrieving information or is it manipulating data on your computer in some way? Maybe creating files, changing files, updating files, moving things around, or is it running actual software on your computer without your awareness? And yes, that means the onus of making sure the MCPS you use are actually doing what they say they are is currently on you, which is not great. Which brings us to developing mcps. The promise of MCPS is we can build responsible interfaces to our data that control who has access to what and what the AI can do with that access. But that requires developers being keenly aware of what data they are giving users access to, what the language models are able to do with that data, and where the threat surfaces exist when you are interacting with that data. There's a real risk when building MCP servers of giving the language models too much access or too many capabilities. Because the language models will not only run the software that is on the server, but may also spin up their own software and leverage the server to do things you didn't intend. So as you're building out your own MCP servers, always apply the principle of least privilege. Build a server so it only accesses data it should be able to access and explicitly limits access to anything else. Ensure the tools you build are limited to only doing what you intend and cannot be extended by the LLM itself. And finally, follow proper procedure when handling any user keys or other private information. As I've said before, MCPs are still relatively new, as is the idea of AI agents. And what we're seeing is the threat surfaces and the blast radius for these things can be exponential and almost unlimited if we are not very careful on how we build these tools. So apply responsible AI principles anytime you're building MCPs, test them extensively with adversarial prompts and prompt injections. And remember the old adage, never trust the user. Well, now that user can be an LLM.

Contents