You've built your AI agent... but how do you know it's not failing silently in production? Building AI agents is only the beginning. If you’re thinking of shipping agents into production without a solid evaluation loop, you’re setting yourself up for silent failures, wasted compute, and eventully broken trust. Here’s how to make your AI agents production-ready with a clear, actionable evaluation framework: 𝟭. 𝗜𝗻𝘀𝘁𝗿𝘂𝗺𝗲𝗻𝘁 𝘁𝗵𝗲 𝗥𝗼𝘂𝘁𝗲𝗿 The router is your agent’s control center. Make sure you’re logging: - Function Selection: Which skill or tool did it choose? Was it the right one for the input? - Parameter Extraction: Did it extract the correct arguments? Were they formatted and passed correctly? ✅ Action: Add logs and traces to every routing decision. Measure correctness on real queries, not just happy paths. 𝟮. 𝗠𝗼𝗻𝗶𝘁𝗼𝗿 𝘁𝗵𝗲 𝗦𝗸𝗶𝗹𝗹𝘀 These are your execution blocks; API calls, RAG pipelines, code snippets, etc. You need to track: - Task Execution: Did the function run successfully? - Output Validity: Was the result accurate, complete, and usable? ✅ Action: Wrap skills with validation checks. Add fallback logic if a skill returns an invalid or incomplete response. 𝟯. 𝗘𝘃𝗮𝗹𝘂𝗮𝘁𝗲 𝘁𝗵𝗲 𝗣𝗮𝘁𝗵 This is where most agents break down in production: taking too many steps or producing inconsistent outcomes. Track: - Step Count: How many hops did it take to get to a result? - Behavior Consistency: Does the agent respond the same way to similar inputs? ✅ Action: Set thresholds for max steps per query. Create dashboards to visualize behavior drift over time. 𝟰. 𝗗𝗲𝗳𝗶𝗻𝗲 𝗦𝘂𝗰𝗰𝗲𝘀𝘀 𝗠𝗲𝘁𝗿𝗶𝗰𝘀 𝗧𝗵𝗮𝘁 𝗠𝗮𝘁𝘁𝗲𝗿 Don’t just measure token count or latency. Tie success to outcomes. Examples: - Was the support ticket resolved? - Did the agent generate correct code? - Was the user satisfied? ✅ Action: Align evaluation metrics with real business KPIs. Share them with product and ops teams. Make it measurable. Make it observable. Make it reliable. That’s how enterprises scale AI agents. Easier said than done.
Developing AI Agents
Explore top LinkedIn content from expert professionals.
-
-
I discovered I was designing my AI tools backwards. Here’s an example. This was my newsletter processing chain : reading emails, calling a newsletter processor, extracting companies, & then adding them to the CRM. This involved four different steps, costing $3.69 for every thousand newsletters processed. Before: Newsletter Processing Chain (first image) Then I created a unified newsletter tool which combined everything using the Google Agent Development Kit, Google’s framework for building production grade AI agent tools : (second image) Why is the unified newsletter tool more complicated? It includes multiple actions in a single interface (process, search, extract, validate), implements state management that tracks usage patterns & caches results, has rate limiting built in, & produces structured JSON outputs with metadata instead of plain text. But here’s the counterintuitive part : despite being more complex internally, the unified tool is simpler for the LLM to use because it provides consistent, structured outputs that are easier to parse, even though those outputs are longer. To understand the impact, we ran tests of 30 iterations per test scenario. The results show the impact of the new architecture : (third image) We were able to reduce tokens by 41% (p=0.01, statistically significant), which translated linearly into cost savings. The success rate improved by 8% (p=0.03), & we were able to hit the cache 30% of the time, which is another cost savings. While individual tools produced shorter, “cleaner” responses, they forced the LLM to work harder parsing inconsistent formats. Structured, comprehensive outputs from unified tools enabled more efficient LLM processing, despite being longer. My workflow relied on dozens of specialized Ruby tools for email, research, & task management. Each tool had its own interface, error handling, & output format. By rolling them up into meta tools, the ultimate performance is better, & there’s tremendous cost savings. You can find the complete architecture on GitHub.
-
It's more important you feed in the right info than try nail the perfect prompt. The way businesses build with AI is changing. The shift is happening due to reasoning development & agentic capability (AI is able to figure out how to achieve an objective on its own). That means managing what information your AI agent has access to at any given moment is more important than tweaking prompts. You might think that giving AI agents access to everything would make them smarter. The opposite is true. As you add more info, AI performance declines aka "context rot." So here's what you need to do: - Keep instructions clear, no duplication - Don't overload your AI with complex rules - Give your AI just enough direction without micromanaging - Provide a focused toolkit where each function has a clear purpose, so 1 agent for each function rather than trying to get one agent to do everything - Let AI agents retrieve information on-demand For work that spans hours / days, use 2 approaches: 1. Summarizing conversation history to preserve what matters 2. Give agents the ability to take & reference their own notes The most effective AI deployments treat information as a strategic resource, not an unlimited commodity. Getting this right means faster, more reliable results from your AI investments. Image from Anthropic describing the evolution of prompt engineering into context engineering below. p.s. I'm looking to take on no more than 2 clients who want to build this layer into their business as part of a new framework I'm developing - focus is on B2B marketing.
-
Anthropic released an excellent article on building effective AI agents, and it has some great recommendations. Here’s what I’d add: (TL;DR - use the right frameworks that focus on simplicity and smart memory management) Their recommendations echoed a lot of what we’re seeing from our customers at Zep AI (YC W24): 1. Start simple: Use the simplest solution possible, only increasing complexity when needed. Many applications don't require full agents - a single well-optimized LLM call with smart memory retrieval can be enough. 2. Understand the taxonomy: Anthropic distinguishes between workflows (predefined code paths) and agents (systems where LLMs dynamically direct their own processes). Different problems need different approaches. 3. Use proven patterns: The most effective implementations use: - Prompt chaining (sequential LLM calls for accuracy) - Routing (directing inputs to specialized tasks) - Parallelization (simultaneous processing for speed or confidence) - Orchestrator-workers (dynamic task delegation) - Evaluator-optimizer (iterative refinement with feedback) 4. Pay attention to tool design: Treat "agent-computer interfaces" with the same care as human interfaces. Well-documented, intuitive tools are crucial for agent success. This is all excellent advice. I’d add two things: 1. Double clicking on tool advice - I’ve talked about this before, but giving the LLM access to the right tools, not every tool, is critical for performance. Too many tools results in poorer performance. 2. Memory layer is crucial - As these systems evolve, productionize, and integrate more and more features and data, stuffing all that information into the context windows gets less and less effective. Invest in a robust memory layer so that you’re providing the agent with the right knowledge at the right time, not all of it all the time. What would you add? What are you seeing in the enterprise?
-
So much jargon in ai… And it’s confusing. Here’s my attempt at simplifying a few concepts: 1/ large language models (LLMs) LLMs and their associated chatbots are what most people think of as “AI” because we all have access to them. Smart tools that mimic human-like intelligence like ChatGPT, Claude, or Gemini generate text based on a prompt. They assemble responses by predicting what words come next. That’s it. Seems like magic, and to some degree it is. But at their core, LLMs are sophisticated word prediction machines. This might help you sleep better when you worry about a future where robots take over (even though they still probably will…). LLMs summarize documents, draft messages, or answer questions. On their own, they only have access to public information, not your proprietary company and customer data unless integrated with internal systems. 2/ workflows Workflows structure and sequence repeatable business processes. Examples include customer onboarding, case resolution, or claims handling. Platforms like Salesforce, ServiceNow, and HubSpot automate these sequences to ensure tasks happen in the right order every time. Some tools are focused exclusively on organizing workflows. Tools like Zapier, Microsoft PowerAutomate, and n8n help organize and run workflows that span multiple applications and data sets. These tools all offer the ability to call out to LLMs to perform work, and often offer their own bolted-on AI features. 3/ ai agents Agents do specific steps within a workflow for you. In an auto insurance claim, you might want to build agents that: - verify customer identity - check account status - collect incident details - review coverages, and - propose next steps Each task can be completed by a different agent that specializes in that activity, it’s related data, and systems. Workflows link all these activities together. In some cases, you can use agents to dynamically manage all the steps (“agentic workflows”). They choose the next best action to take and call on the appropriate agents in the order they determine makes most sense. For example, to identify an order delay an agent might scan sales, order management, and field service systems to figure out what’s going on and then call on the appropriate agent to facilitate the appropriate next step. Somep processes are better suited to this than others. So… * LLMs replicate human-like decision making * Workflows organize end to end processes * Agents automate specialized tasks within workflows When combined, we can automate things we’ve never been able to automate before and take on tasks that were previously unimaginable. What tools are you deploying across these categories in your company? Follow Balboa Solutions Jay Nathan
-
During this year, we’ve seen mid-sized and large companies rush to “build agents” - skipping straight to the most hyped layer. Most begin with a quick automation and then, impatient, chase fully autonomous agents. That leap costs time, trust and money. There are three practical layers - each a different tradeoff between speed, control and capability. (A) Non-Agentic Workflows (where everyone should start) This is basic AI usage: User input → LLM processes the request → Output delivered. Great for narrow, well-structured tasks like- Summarising call transcripts into bullet-point action items Summarising product specs They’re quick to build, reliable, and inexpensive - but limited. B) Agentic Workflows: Example from a mid-size insurer that we worked with. Here, multiple systems/AI agents work together with some decision logic. You’re not just calling an LLM - you’re orchestrating steps. Goal: Cut insurance claim inquiry response time and reduce cost without adding headcount. The workflow + Agentic AI steps include: → Reads incoming claim requests → Retrieves policy and claimant data from internal systems → Checks claim status and required documentation → Generates an accurate, policy-compliant response → Escalates to humans only when risk or complexity flags trigger Impact: 38% of claims resolved end-to-end by the agentic layer 60% faster responses for claimants C) AI Agents (Not enterprise ready - for now) Here's the reality: Most "AI agents" are just fancy workflows with better marketing. Real agents should: Form a plan based on ambiguous goals Choose tools on the fly, not in a fixed sequence Learn from outcomes and adapt Escalate with clear reasoning We're certainly on a journey in that direction,, but the technology isn't quite there yet for most enterprise use cases (where process control is important) . Don't get caught up in the hype. Focus on building solid automation that actually reduces operational cost. Most companies wanna jump straight to "AI agents" and end up with broken, unreliable systems. Start simple. Build workflows that solve real problems. Then gradually add complexity. Srinivas K
-
AI isn’t replacing adjusters. It’s giving them superpowers. 🦸 Claims used to take weeks. 📄 Now AI agents can settle them in minutes — and spot fraud before it even happens. ⚡ Claims are the heartbeat of insurance — but for decades, they’ve been slow, manual, and painful for everyone involved. ⏳ Today, AI agents are flipping that script. Startups are showing how AI can handle complex tasks: pulling documents, verifying facts, triaging claims, and even assisting human adjusters — all in real time. 🧠 But the impact goes beyond just speed. AI is also attacking one of the industry’s biggest problems: fraud — which costs insurers over $300 billion a year. 🏴☠️ By using synthetic data to simulate rare fraud scenarios, insurers can now train machine learning models to detect hidden patterns long before a human ever would. This isn't about replacing humans — it’s about augmenting them. 🤝 Imagine AI spotting a staged accident before the claim even hits a human desk. Or flagging suspicious billing patterns across thousands of claims in seconds. And it’s not science fiction — it’s already happening. Companies like Inshur are using connected car data to adjust premiums dynamically after an incident, making claims not just faster — but fairer for everyone involved. 🚗📊 The real revolution isn’t just faster claims. It’s smarter, more predictive, and more customer-centric claims — built on AI + human collaboration. Speed alone won't win. 🔍 Transparency, 🛡️ fairness, and 🤖 human-in-the-loop systems will define the winners. 👉 What part of claims handling do you think AI will transform next? Drop your thoughts below! 💬 Book me on hubble: ↪️ https://lnkd.in/e5J_TbTT Sign up to my blog: ↪️ https://lnkd.in/gK2tVfxn Read more about my thoughts on AI & Risk: ↪️ https://lnkd.in/gttbgK8x
-
Tired of GPT giving your agents and customers generic answers? Let me show you something better. I’ve been talking a lot about AI guardrails for complex industries like insurance. Today, I’m pulling back the curtain on how to combine GPT with Zingtree’s dynamic workflows and integrations to get personalized and accurate policy recommendations. 1. The generic approach Ask GPT: “What insurance should I get if I have a new kid and a used car?” You’ll see 6 nice bullet points—informative, but not tailored to your situation. 2. The Zingtree approach Step one: Agent enters the customer’s name and phone number. We instantly pull real-time data from Salesforce—policy details, family updates, you name it. Step two: Zingtree’s workflow prompts you to capture situational data—maybe the customer just had a new baby or bought a car. Step three: GPT uses this detailed context to deliver a precise, compliant recommendation right inside Salesforce. 3. Handling objections What if the customer wants a price match? We can handle that too! Because GPT is connected to your data, it can handle price-match objections or suggest bundle discounts—right on the spot, all within the same flow. No more bouncing between 3–4 different systems. — This is a game-changer for any insurance provider (or really, any enterprise looking to up their AI game). Think healthcare, finance, or consumer products. The best part? This level of customization isn’t just “AI generated copy”—it’s the real deal, guided by your actual data and real-time context. If you’d like to see this customized for a different industry, drop a comment on what you’d like to see! #AI #Insurance #GPT
-
Insurance paperwork doesn’t have to be a bottleneck—AI is redefining how we manage it. This guide reveals how AI Agents can transform your insurance operations: • Extract data from diverse document formats • Validate information automatically • Classify and organize documents intelligently • Streamline repetitive workflows Discover real-world examples of how AI automates claims processing, policy issuance, and compliance reporting - slashing processing times from days to hours. Learn about the key benefits: - Boost efficiency and productivity - Improve data accuracy - Accelerate turnaround times - Scale operations effortlessly - Ensure compliance and audit-readiness We also cover implementation challenges and how to overcome them. Ready to transform your insurance document processes? Get the full guide here: https://lnkd.in/eAWuMmUb See how our AI Agents can reduce costs, speed up operations, and delight your customers. The future of insurance is automated - don't get left behind.
-
🚨 Hot off the press! 🚨 I’m honored to be featured in Modern Insurance Magazine – Issue 72 📰 with my article: “AI: Promise and Peril – How Insurance Leaders Can Harness the Power of Agentic AI and MARL Without Losing Control” 🧠⚖️🤖 🎯 In this piece, I explore how AI Agents and Multi-Agent Reinforcement Learning (MARL) are rapidly evolving from experimental concepts to enterprise-grade tools poised to reshape the insurance value chain. 🏗️ From automating claims triage to deploying self-learning fraud detection systems and optimizing underwriting in real-time, I break down how insurers can: ✅ Leverage Agentic AI to make smarter, faster decisions ✅ Deploy MARL-powered systems to dynamically adapt across complex processes ✅ Avoid ethical, regulatory, and operational pitfalls through robust AI governance and simulation platforms 💥 The article also outlines the 4 key pillars insurers need to master as they embrace intelligent automation at scale: 1️⃣ Intentional Architecture – Why point solutions aren’t enough anymore 2️⃣ Transparent Orchestration – The need for explainable, observable AI workflows 3️⃣ AI Governance at the Core – Managing risk, bias, and accountability 4️⃣ Business-Led Innovation – Enabling underwriters, claims leaders, and operations to safely experiment with AI Agents without waiting for IT 🔄 I also challenge the industry to move beyond narrow automation and begin simulating multi-agent business ecosystems that evolve, learn, and optimize autonomously. 👁🗨 Think of this as a call to action: Insurance firms must embrace a future where AI doesn’t just support humans—it collaborates, learns, and scales alongside them. 🤝🧠⚙️ I’m deeply grateful to be featured alongside a brilliant group of industry experts and innovators who are each transforming their corner of the insurance world: Katie King, MBA, David Alexander Eristavi Costas Christoforou, PhD, Darren Hall, Will Prest MBCS Lior Koskas Tracey Sherrard Jason Brice Simon Downing Mia Constable Nik Ellis Jane Pocock♻️🚙 Greg Laker – your perspectives on data, automation, ethics, claims, and the customer experience added incredible depth to this edition 🙌 🔗 If you’re an executive, innovator, or transformation leader in the insurance space, this one’s for you. Let’s shape the future of insurance—intelligent, adaptive, and human-centered. 👉 Contact me for more information about leveraging AI Agents in the Insurance Industry 🚀 #AI #Insurance #AIagents #MARL #AgenticAI #InsurTech #ClaimsAutomation #Underwriting #DigitalTransformation #FraudDetection #CX #ModernInsurance #ThoughtLeadership #ResponsibleAI #PX42AI #SimulationFirst #NoCodeAI #Governance