How to measure AI agent failures with transcript analysis

This title was summarized by AI from the post below.

Measuring how often an AI agent succeeds at a task can help us assess its capabilities – but it doesn’t tell the whole story. We’ve been experimenting with transcript analysis to better understand not just how often agents succeed, but why they fail. Our model evaluations generate thousands of transcripts, which can contain an entire novel’s worth of text. They are a record of everything the model did during a task, including the external tools it accessed, and its outputs at each step. In a recent case study, we analysed almost 6,400 transcripts from AISI evaluations of nine models on 71 cyber tasks. We studied several features of these transcripts, including overall length and composition, and the agent’s commentary throughout. We found that there are many reasons a model may fail to complete a task, beyond capability limitations. These can include safety refusals, lack of compliance with scaffolding instructions, or difficulty using tools. We’re sharing our analysis to encourage others conducting safety evaluations to review their own transcripts, in a systematic and quantitative way.  This can help foster more accurate and robust claims about agent capabilities. Read more on our blog: https://lnkd.in/eiCn6zkP

👏 Vital work AI Security Institute - the commercial realm needs to know if they can trust AI Agents. Vendor performance claims vary, mostly because they are not tested in a uniform way by non-biased parties. Your work will help organisations make decisions and get beyond the proof of concept into realising the benefits faster and with reduced risk.

Like
Reply
Suzanne Iris Brink, PhD

Head of Responsible AI at Lloyds Banking Group

1mo

Chi Zhang, PhD

Like
Reply
See more comments

To view or add a comment, sign in

Explore content categories