Proving an AI Copilot in CX: A 12-Month Production Case Study
Inside XtendOps’ SmartAgent Copilot: Real-world adoption, speed, and impact data for CX leaders evaluating GenAI.
TL;DR:
GenAI is reshaping customer support-but most companies are still figuring out how to move from potential to practice. This article shares operational insights from XO’s 12-month, full-scale roll out of SmartAgent Copilot : XtendOps’ proprietary AI tool built to augment agents directly within live customer support workflows. The deployment supported a high-volume support operation handling over 100,000 tickets per month, starting with a 300-agent team and four distinct cohorts to measure real-world adoption and impact.
What’s inside:
A breakdown of Copilot’s deployment, including:
- What it was built to do — Deliver real-time, context-aware reply suggestions to help agents work faster and more accurately — supporting, not replacing, human agents.
- What happened in production — Agents could accept, edit, or skip AI suggestions, revealing adoption patterns, trust signals, and friction points.
- How success was measured — Beyond surface metrics, XO tracked deep behavioral KPIs like AI Accuracy — its usage rate (see how we define it here — along with AHT, CSAT, and onboarding speed to show Copilot’s actual operational impact.
Why read:
A rare, practical look at how CX teams move GenAI from theory to practice — showing how large groups adopt, adapt, and accelerate with AI when stakes and ticket volumes are high.
Solving Real CX Friction: The Goal Behind Our Copilot Deployment
In early 2024, XO partnered with a leading global meal kit company to launch its first full-scale SmartAgent Copilot deployment. The goal: reduce friction in the support workflow and drive measurable, real-world improvements.
Before Copilot, support agents managed a high volume of customer inquiries-many repetitive, time-sensitive, and nuanced. While knowledge base articles and macros existed, consistency and efficiency varied widely across agents.
Our core hypothesis: Could AI generate responses accurate enough that — even with agent edits or legacy processes – we’d still see faster resolution times and a reduction in average handle times?
XO’s Copilot was introduced to close that gap-supporting agents with high-quality suggestions designed to speed up every interaction.
It was designed to suggest real-time, context-aware responses for agents handling live chats and emails — helping them respond faster, write more accurately, and maintain a high standard of quality without slowing down.
Unlike bots or rigid/standalone automation tools, XO’s AI Copilot was built to support agents, not replace them. It works inside the agent’s existing workflow, offering suggestions in real time-without disrupting established processes.
- ✔ Fast replies, without sounding robotic
- ✔ Context-aware suggestions, not generic scripts
- ✔ Edits that came from agents — not forced by AI
And the real test? Adoption.
The challenge wasn’t just what the AI could do-it was whether agents would rely on it under pressure, in live interactions. That’s what made this a meaningful pilot: a true test of adoption in a high-volume, high-variability environment, where speed, trust, and consistency all matter.
From Design to Deployment: How the Copilot Was Rolled Out
The Copilot wasn’t tested in isolation or artificial environments. It was deployed inside a live support operation that handled over 100,000 monthly tickets, across both chat and email.
Agents faced a wide range of customer scenarios, from delivery issues to refunds and account concerns. These weren’t one-click problems. They required fast, accurate, and human decisions — made under pressure.
To reduce friction, we embedded Copilot directly inside the platform agents already used every day. No system hopping. Responses appeared in-line, in real time — right where the work was happening. Each time a customer message came in, Copilot suggested a response — fully written out, grounded in context, and tuned to the brand’s tone of voice.
From there, agents could:
- ✅ Send it directly if it worked
- ✏️ Tweak it if something needed adjusting
- 🗑 Skip it entirely if it didn’t align with their judgment
We didn’t script the behavior. We didn’t enforce usage. That was intentional-because we weren’t just testing functionality. We were testing what really matters in GenAI adoption: behavioral trust. Not assumed usage, not theoretical output accuracy-but whether agents chose to use Copilot when it counted.
To move beyond surface-level metrics, we needed to go deeper.
We didn’t just track who clicked what. We tracked whether using Copilot actually made things better — faster resolutions, higher quality, better experiences. We measured whether higher AI usage correlated with improvements in AHT, QA, and CSAT — building a behavioral feedback loop rooted in performance, not just activation.
Building Trust → Feedback Loops & Agent Psychology
From the start, our product team treated agents as subject-matter experts — not just end users.
We built active feedback loops that involved agents as co-designers and expert CX architects. The belief was simple: the best AI for customer experience can only be built by learning from the people who know the work best — the agents themselves.
How we did it:
- Active feedback loops: Agents weren’t just end-users; they were co-designers. Their insights shaped every iteration, from Tiger Team pilots to large-scale rollouts.
- Product Team Openness: Our product team listened first, treated agents as experts, learning from their frontline experience to tune prompts & workflows and adapted the product based on agent input.
- Agent psychology: Adoption started with making agents feel like architects, not subjects. We created safe spaces for feedback, recognized contributions, and made it clear: Copilot existed to support, not replace, their expertise.
This approach didn’t just bridge the gap between tech and BPO-it broke down silos. XO brought product, engineering, and frontline BPO teams together from day one, building a Copilot that fits real agent workflows, not just theory.
At XO, we build Tech with BPO, not just for BPO-empowering agents and ensuring solutions match operational reality.
Validating in the Real World → The Four Cohorts
To ensure Copilot worked across every context, XO launched four distinct agent cohorts from an initial pool of 300 agents:
- Tiger Team: 15 top performers (5% of total HC) – Helped train and fine-tune Copilot
- Large Cohort: 57 tenured agents (19%) – Tested adoption & validated Copilot at scale
- Low-Performer Cohort: 10 agents (3%) – Tested Copilot as an upskilling accelerator
- New Hires: 15 agents (5%) – Measured onboarding/ramp acceleration and early-stage reliability.
Together, these groups represented over 30% of the total agent workforce — providing a clear view of where XO’s AI Copilot performed best, and where it delivered the most value across experience levels, task complexity, and use cases.
Measurable Gains from the Copilot Rollout
XO’s Copilot rollout wasn’t just about launching new technology-it was about driving measurable improvements across the support operation. With over 100 agents participating, organized into four cohorts and running in live production, we had the foundation most GenAI projects lack: Proof.
That proof didn’t come from isolated benchmarks or log data. It came from clear, observable performance gains — tracked across varied agent profiles and day-to-day support scenarios. Critically, those results were only possible because agents actually used Copilot. AI usage ranged from 74% to 77%, with agents either sending suggestions as-is or making only minor edits — a real signal of trust, not forced adoption.
That level of adoption drove measurable, high-impact results:
→ At a Glance: Key Performance Outcomes (8 Weeks Post-Deployment)
This visual summarizes AI usage and key performance improvements for each agent cohort following the Copilot rollout, highlighting gains in Efficiency, Quality, and Customer Satisfaction:
- 📊 24%+ reduction in Average Handle Time -AHT- across all groups
- 📊 CSAT maintained or improved (92–94%)
- 📊 QA and FCR scores stable or increased
- 📊 Up to 12.1% reduction in Cost per Contact
- 📊 New Hires ramped 7 weeks faster
The pattern was consistent:
The more agents used Copilot, the stronger the results.There was a direct correlation between higher Copilot usage and improved performance outcomes across all cohorts. This was true augmentation, not background automation.
→ Performance Highlights (8 Weeks Post-Deployment)
- AI usage — measured using our AI Accuracy Score — showed that agents accepted most Copilot suggestions during live interactions (Learn more about how this metric works in our previous article here)
- Key drivers of success: Copilot was seamlessly embedded in workflows, tuned with frontline feedback, and designed to support-not override-agent judgment.
- These results reflect large-scale, day-to-day operations with real customers-not just limited pilots or theoretical models.
→ Spotlight: Accelerating New Hire Success
Among all groups, new hires saw some of the most significant gains. With Copilot integrated into onboarding, new agents reached target proficiency 7 weeks faster than previous classes — a major acceleration of the learning curve.
New Hire Cohort (First 8 Weeks):
- 📊 AI Usage: 74%
- 📊AHT: 24.1% reduction
- 📊QA: 79% (2% improvement over baseline)
- 📊CSAT: 87.9% (1.9% improvement over baseline)
- 📊FCR: 87% (5% improvement over baseline)
- 📊Cost per Contact: 5.9% reduction (2% above client goal)
- 📊 24%+ reduction in Average Handle Time -AHT- across all groups
This translated into faster, more confident onboarding and better CX outcomes from day one.
What We Observed: Where AI Performed Best
Beyond adoption metrics, agent behavior revealed where Copilot truly excelled-and where it still had room to grow.
Channel Differences:
- Live Chat: Copilot had its highest adoption in live chat, where agents managed multiple conversations and speed was key. It gave them a fast, high-quality starting point.
- Email: Usage was steady, but edits were more frequent. Longer-form replies needed more personalization — especially in tone and structure — pointing to areas for tuning.
Variations by Agent Experience:
- Top Performers: Used Copilot selectively, often editing for tone or clarity rather than content.
- New Hires: Relied on Copilot more consistently, leveraging it as a confidence booster and a tool for faster alignment with brand tone and structure.
- Low Performers: Saw significant efficiency gains when using Copilot as an assistive writing layer.
What We Learned & What It Means for our Customers
This rollout was a behavioral shift, not just a tech deployment. Copilot’s value came from real usage, frontline feedback, and integration into workflows.
Key takeaways for teams scaling GenAI in customer experience:
- Start with usage. Focus on adoption, not just automation.
- Measure agent behavior to gauge trust.
- Build with your agents. Frontline feedback is essential to refine AI and workflows.
- Design AI to augment, not replace. The best AI supports agent judgment.
- Validate at scale. Real-world cohorts reveal what works-and what doesn’t.
The win was human-AI collaboration at scale, proving AI works best when it earns trust in daily work. The lesson: AI adoption starts with trust, and that trust is earned in the flow of work.
At XO, Copilot shows that GenAI delivers real value when it works with people. For teams ready to scale, success lies in smart deployment, tight feedback loops, and agent-led design.
Looking Ahead…This Case Study is just the beginning!
Want to see how XO builds operational AI from the inside out? Keep an eye on our upcoming articles for practical insights on deployment, design, and scaling GenAI in CX.
Ready to bring GenAI into your CX operations? XO has the blueprint. Let’s talk
GLOBAL HEAD OF SALES ENABLEMENT & TRAINING
6moThis is a brilliant study. Huge number of interactions makes this meaningful. It’s great you were able to measure trust along with getting feedback about suggestions. And this is the perfect outcome - it was built to support agents, not replace them.