How I Explain “How To Build an AI Agent”

Pedro Laboy

Published Nov 18, 2025

When people ask me how to build an AI agent, I always come back to the same picture: you are basically giving software a brain, a job description, tools, memory, and a way to talk to people.

I walk through it in seven steps:

System prompt
LLM
Tools
Memory
Orchestration
UI
AI evals

Let me go through each one as if we were sitting in a room together with a whiteboard.

1. SYSTEM PROMPT

Question we are answering: “What exactly do we want this agent to be and do?”

I treat the system prompt like a very specific job descripZtion plus a playbook.

When I design it, I usually write down three things:

Goal
Role
Rules & style

I like to write it in clear, simple language, almost like I am talking to a junior colleague. Then I add a few examples of good and bad answers so the model understands what “good” looks like.

Where this lives in practice:

In OpenAI: inside system instructions for an Assistant, App, or Custom GPT.
In Claude Projects: in the project’s “instructions”.
In frameworks like LangChain / LangGraph / LlamaIndex: as a prompt template in code.
In low code tools like Dify, Flowise, Relevance AI: as the “system prompt” field.

If you get this step wrong, no framework will save you. A tight prompt is the cheapest performance boost you will ever get.

2. LLM

Question we are answering: “What brain will this agent use?”

Here I look at four things:

Quality
Latency
Cost
Data / compliance constraints

If I need top quality and reasoning, I pick a top frontier model from OpenAI, Anthropic, or Google. If I have hard data residency or privacy requirements, I look at:

Strong regional providers, or
Open models like Kimi K2 Thinking or GPT-OSS running via Ollama, vLLM, or cloud offerings from Nvidia and others.

How I plug the model in:

In TypeScript, I often use Vercel AI SDK so I can swap models behind a common interface and get streaming “chat” out of the box.
In Python, I usually go through LangChain or direct API calls.
In a low code environment, I just pick from the list: “Use GPT-5”, “Use Claude 4.5”, and so on.

My advice when someone starts: begin with one good model that fits your budget, ship something, then worry about fancy model routing later.

3. TOOLS

Question we are answering: “What can this agent actually do in the real world?”

Without tools, the agent can only talk. With tools, it can:

Look up a customer in your CRM
Pull numbers from your data warehouse
Create a ticket or send an email
Call another internal service or workflow

Technically, a “tool” is just a function the model is allowed to call. You define the tool, describe it in plain language, and the model decides when to use it.

Typical tools I expose:

get_customer_by_email
get_open_invoices(customer_id)
create_support_ticket(subject, description, priority)
search_docs(query)
execute_sql(query) but heavily constrained for safety

How I usually implement tools:

With Claude + MCP:
With LangChain / LangGraph:
In low code:

A good rule of thumb: design tools around business actions, not technical primitives. “Create invoice for customer X” is better than “POST /invoices with this JSON”.

4. MEMORY

Question we are answering: “What should the agent remember, and where does that live?”

I think about memory in four buckets.

Conversation memory
Episodic memory
Knowledge memory
Structured business data

For knowledge memory, I usually use RAG:

Break documents into chunks
Embed them as vectors
Store them in a vector database
At query time, search similar chunks and feed them to the model

Common vector stores: Pinecone, Qdrant, Weaviate, Chroma, pgvector, Milvus.

How I wire this up:

LlamaIndex or LangChain if I want prebuilt document loaders, chunking, retrievers, and RAG pipelines.
A vector DB plus a slim custom retriever if I want more control.
For structured data, I never just dump tables into the prompt. I usually expose read and write operations as tools:

The key idea: the agent does not need to “know everything”. It just needs reliable ways to fetch the right pieces of memory at the right time.

5. ORCHESTRATION

Question we are answering: “How do all these pieces work together in a robust way?”

Once you have a brain, tools, and memory, you need a control layer that handles:

Which step runs first
How to handle failures and retries
When to ask the user for clarification
When one agent should hand off to another

I usually distinguish between:

Simple orchestration
Complex orchestration

What I use:

LangGraph when I want explicit, stateful, graph shaped workflows.
OpenAI Apps / Agents SDK when I want something closer to “agent inside my product” with tools and MCP but not a huge framework.
CrewAI / AutoGen when I want multiple specialist agents collaborating.
Visual builders like Dify, Flowise, Dust when I want non developers to see and tweak flows.

In a business setting I almost always start with one main agent plus a small, clear graph of steps, not a swarm of agents arguing with each other.

6. UI

Question we are answering: “How do people actually use this thing?”

An agent is not just a chat box. You can put it in different places:

A web chat interface
Inside your SaaS product as “AI copilot”
Slack or Teams
A mobile app
Even fully in the background, with email or notifications as the “UI”

The most important thing is that the interface makes the job obvious.

Instead of “Ask me anything”, I prefer:

“Ask about a customer and I will pull their full context”
“Describe a lead, I will write and log a follow up email”
“Paste a contract, I will summarize key risks and obligations”

How I usually build the UI:

For web apps: Next.js plus Vercel AI SDK so I get streaming, tool aware chat with minimal boilerplate.
For internal tools: Streamlit or Gradio to spin something up quickly.
For existing workflows: Slack bots, Teams apps, or simple web widgets embedded in internal portals.

Your first version can be extremely simple: one text box, a short description, and a conversation pane. If the value is real, users will forgive a basic UI at the start.

7. AI EVALS

Question we are answering: “How do we know the agent is good, and how do we keep improving it?”

I treat this a bit like product analytics plus exams for the agent.

There are two loops.

a) Offline evals

Before I roll an agent out broadly, I create a small test set:

Realistic inputs
Expected outputs or at least “better” and “worse” examples

Then I:

Run the agent on that set
Score it either with humans or with an “LLM as judge” approach
Compare versions when I change prompts, tools, or models

Tools that help: LangSmith, Ragas, and simple scripts plus spreadsheets if I want something lightweight.

b) Online evals

Once the agent is live, I track:

Thumbs up / thumbs down
Escalations to humans
Tool failures and timeouts
Latency and cost per interaction
Business metrics such as “tickets resolved without human”, “time to resolution”, “conversion rate lift”

From there I do small, focused experiments:

“What if we change the system prompt to ask for 2 clarifying questions?”
“What if we change the retrieval strategy to fetch 10 chunks instead of 4?”
“What if we switch the model only for this workflow?”

The point is not to be perfect. The point is to be systematically “less wrong” every week.

PUTTING IT ALL TOGETHER

If we were building your first production agent, the path would look like this:

Write a clear system prompt that describes the role, goal, and boundaries.
Pick a strong, affordable model and wire it up through a solid SDK or framework.
Give it a small set of well defined tools that map to real business actions.
Add memory so it can use your docs and your data instead of guessing.
Wrap everything in a simple but reliable orchestration flow.
Put a clean UI in front of real users inside the tools they already live in.
Watch how it behaves, run evals, and iterate.

If you tell me the specific agent you want to build next, I can map this seven step structure to a concrete stack and architecture you could hand to an engineering team.

Leandro Sehnem Bortolotto

Excellent post, Pedro Laboy, as always. I loved the tools and models you suggested for each phase.

To view or add a comment, sign in

How I Explain “How To Build an AI Agent”

Pedro Laboy

1. SYSTEM PROMPT

Where this lives in practice:

2. LLM

How I plug the model in:

3. TOOLS

How I usually implement tools:

4. MEMORY

How I wire this up:

5. ORCHESTRATION

What I use:

6. UI

How I usually build the UI:

7. AI EVALS

a) Offline evals

b) Online evals

PUTTING IT ALL TOGETHER

More articles by Pedro Laboy

Explore content categories

1. SYSTEM PROMPT

Where this lives in practice:

2. LLM

How I plug the model in:

3. TOOLS

How I usually implement tools:

4. MEMORY

How I wire this up:

5. ORCHESTRATION

What I use:

6. UI

How I usually build the UI:

7. AI EVALS

a) Offline evals

b) Online evals

PUTTING IT ALL TOGETHER

More articles by Pedro Laboy

The Playbook for LLM Answer Engine Optimization

Scaling AI Startups: A Practical Guide for Founders and VCs

AI is Not Plug-and-Play: Your Enterprise AI Readiness Framework

Building an AI Agent Framework with the “Lang” Stack

Modern RAG: From Simple Pipelines to Agentic, Multimodal Knowledge Systems

Eight Foundational AI Principles for Business Leaders

Architecting Intelligence: The 6 Pillars of a Modern AI Framework

Notebook Thoughts: An AI Guide for Marketers

Notebook Thoughts: A Machine Learning Approach to Marketing

Notebook Thoughts: The 2021 Marketing Technology Stack

Explore content categories