Optimizing Coding Agent Rules (./clinerules) for Improved Accuracy
It's your friendly monthly content roundup from Arize. Check out the latest on agent system prompt optimization, LLM self-evaluation bias, and more.
We improved Cline , a popular open-source coding agent, by +15% accuracy on SWE-Bench — without retraining LLMs, changing tools, or modifying Cline's architecture. Read how in this piece by Priyan Jindal .
When building and testing AI agents, one practical question that arises is whether to use the same model for both the agent’s reasoning and the evaluation of its outputs. We ran an experiment to offer a nuanced answer. Learn more about LLM self-evaluation bias in this piece by Sanjana Yeddula .
By connecting NVIDIA NeMo microservices with Arize AX’s production observability, you can build a data flywheel that turns production insights into model refinements in hours instead of weeks. This blog, by Richard Young , includes code examples.
Useful Guides & Updates
📦 Freshly Shipped: What's new in Arize AX in October
📚 AI Researcher Show-and-Tell: ServiceNow's Tara Bogavelli explains AgentArch and benchmarking agents for enterprise workflows.
Upcoming Events
Get in the room with other agent engineers and builders.
- November 6, London | Reliable AI Agents with Google DeepMind and CrewAI
- November 12, Virtual | LLM-as-a-Judge 101
- November 12, Sunnyvale | Agents In Action with Google Cloud and AI at Meta
- November 18, San Francisco | Building and Evaluating TypeScript Agents with Mastra
- December 3, Las Vegas | re:Invent Refueled: Chocolate, Coffee, & AI
Build. Learn. Connect.
Whether you're deep in the trenches building AI agents or just exploring what's next, there are resources to help. Book a demo.
We build what the government buys and automate the rest. AI GTM operator driving pilots to full consumption with speed and precision.
2wI've been toying with recursive thought mimicry in LLMs lately and I find the models typically anchor pretty hard after just a few cycles. I wonder if that's an artifact of bias.