How I Explain “How To Build an AI Agent”
When people ask me how to build an AI agent, I always come back to the same picture: you are basically giving software a brain, a job description, tools, memory, and a way to talk to people.
I walk through it in seven steps:
- System prompt
- LLM
- Tools
- Memory
- Orchestration
- UI
- AI evals
Let me go through each one as if we were sitting in a room together with a whiteboard.
1. SYSTEM PROMPT
Question we are answering: “What exactly do we want this agent to be and do?”
I treat the system prompt like a very specific job descripZtion plus a playbook.
When I design it, I usually write down three things:
- Goal
- Role
- Rules & style
I like to write it in clear, simple language, almost like I am talking to a junior colleague. Then I add a few examples of good and bad answers so the model understands what “good” looks like.
Where this lives in practice:
- In OpenAI: inside system instructions for an Assistant, App, or Custom GPT.
- In Claude Projects: in the project’s “instructions”.
- In frameworks like LangChain / LangGraph / LlamaIndex: as a prompt template in code.
- In low code tools like Dify, Flowise, Relevance AI: as the “system prompt” field.
If you get this step wrong, no framework will save you. A tight prompt is the cheapest performance boost you will ever get.
2. LLM
Question we are answering: “What brain will this agent use?”
Here I look at four things:
- Quality
- Latency
- Cost
- Data / compliance constraints
If I need top quality and reasoning, I pick a top frontier model from OpenAI, Anthropic, or Google. If I have hard data residency or privacy requirements, I look at:
- Strong regional providers, or
- Open models like Kimi K2 Thinking or GPT-OSS running via Ollama, vLLM, or cloud offerings from Nvidia and others.
How I plug the model in:
- In TypeScript, I often use Vercel AI SDK so I can swap models behind a common interface and get streaming “chat” out of the box.
- In Python, I usually go through LangChain or direct API calls.
- In a low code environment, I just pick from the list: “Use GPT-5”, “Use Claude 4.5”, and so on.
My advice when someone starts: begin with one good model that fits your budget, ship something, then worry about fancy model routing later.
3. TOOLS
Question we are answering: “What can this agent actually do in the real world?”
Without tools, the agent can only talk. With tools, it can:
- Look up a customer in your CRM
- Pull numbers from your data warehouse
- Create a ticket or send an email
- Call another internal service or workflow
Technically, a “tool” is just a function the model is allowed to call. You define the tool, describe it in plain language, and the model decides when to use it.
Typical tools I expose:
- get_customer_by_email
- get_open_invoices(customer_id)
- create_support_ticket(subject, description, priority)
- search_docs(query)
- execute_sql(query) but heavily constrained for safety
How I usually implement tools:
- With Claude + MCP:
- With LangChain / LangGraph:
- In low code:
A good rule of thumb: design tools around business actions, not technical primitives. “Create invoice for customer X” is better than “POST /invoices with this JSON”.
4. MEMORY
Question we are answering: “What should the agent remember, and where does that live?”
I think about memory in four buckets.
- Conversation memory
- Episodic memory
- Knowledge memory
- Structured business data
For knowledge memory, I usually use RAG:
- Break documents into chunks
- Embed them as vectors
- Store them in a vector database
- At query time, search similar chunks and feed them to the model
Common vector stores: Pinecone, Qdrant, Weaviate, Chroma, pgvector, Milvus.
How I wire this up:
- LlamaIndex or LangChain if I want prebuilt document loaders, chunking, retrievers, and RAG pipelines.
- A vector DB plus a slim custom retriever if I want more control.
- For structured data, I never just dump tables into the prompt. I usually expose read and write operations as tools:
The key idea: the agent does not need to “know everything”. It just needs reliable ways to fetch the right pieces of memory at the right time.
5. ORCHESTRATION
Question we are answering: “How do all these pieces work together in a robust way?”
Once you have a brain, tools, and memory, you need a control layer that handles:
- Which step runs first
- How to handle failures and retries
- When to ask the user for clarification
- When one agent should hand off to another
I usually distinguish between:
- Simple orchestration
- Complex orchestration
What I use:
- LangGraph when I want explicit, stateful, graph shaped workflows.
- OpenAI Apps / Agents SDK when I want something closer to “agent inside my product” with tools and MCP but not a huge framework.
- CrewAI / AutoGen when I want multiple specialist agents collaborating.
- Visual builders like Dify, Flowise, Dust when I want non developers to see and tweak flows.
In a business setting I almost always start with one main agent plus a small, clear graph of steps, not a swarm of agents arguing with each other.
6. UI
Question we are answering: “How do people actually use this thing?”
An agent is not just a chat box. You can put it in different places:
- A web chat interface
- Inside your SaaS product as “AI copilot”
- Slack or Teams
- A mobile app
- Even fully in the background, with email or notifications as the “UI”
The most important thing is that the interface makes the job obvious.
Instead of “Ask me anything”, I prefer:
- “Ask about a customer and I will pull their full context”
- “Describe a lead, I will write and log a follow up email”
- “Paste a contract, I will summarize key risks and obligations”
How I usually build the UI:
- For web apps: Next.js plus Vercel AI SDK so I get streaming, tool aware chat with minimal boilerplate.
- For internal tools: Streamlit or Gradio to spin something up quickly.
- For existing workflows: Slack bots, Teams apps, or simple web widgets embedded in internal portals.
Your first version can be extremely simple: one text box, a short description, and a conversation pane. If the value is real, users will forgive a basic UI at the start.
7. AI EVALS
Question we are answering: “How do we know the agent is good, and how do we keep improving it?”
I treat this a bit like product analytics plus exams for the agent.
There are two loops.
a) Offline evals
Before I roll an agent out broadly, I create a small test set:
- Realistic inputs
- Expected outputs or at least “better” and “worse” examples
Then I:
- Run the agent on that set
- Score it either with humans or with an “LLM as judge” approach
- Compare versions when I change prompts, tools, or models
Tools that help: LangSmith, Ragas, and simple scripts plus spreadsheets if I want something lightweight.
b) Online evals
Once the agent is live, I track:
- Thumbs up / thumbs down
- Escalations to humans
- Tool failures and timeouts
- Latency and cost per interaction
- Business metrics such as “tickets resolved without human”, “time to resolution”, “conversion rate lift”
From there I do small, focused experiments:
- “What if we change the system prompt to ask for 2 clarifying questions?”
- “What if we change the retrieval strategy to fetch 10 chunks instead of 4?”
- “What if we switch the model only for this workflow?”
The point is not to be perfect. The point is to be systematically “less wrong” every week.
PUTTING IT ALL TOGETHER
If we were building your first production agent, the path would look like this:
- Write a clear system prompt that describes the role, goal, and boundaries.
- Pick a strong, affordable model and wire it up through a solid SDK or framework.
- Give it a small set of well defined tools that map to real business actions.
- Add memory so it can use your docs and your data instead of guessing.
- Wrap everything in a simple but reliable orchestration flow.
- Put a clean UI in front of real users inside the tools they already live in.
- Watch how it behaves, run evals, and iterate.
If you tell me the specific agent you want to build next, I can map this seven step structure to a concrete stack and architecture you could hand to an engineering team.
CEO & Technology Entrepreneur | Strategy & Innovation | AI & Automation | SaaS, Fintechs & Global Platforms | Digital Products | International Executive Leadership | Scalability & Sustainable Results
3dExcellent post, Pedro Laboy, as always. I loved the tools and models you suggested for each phase.