Name: We rebuilt Devin for Claude Sonnet 4.5. Available starting today as an Agent Preview that’s over 2x faster and 12% better on our Jr. Developer Evals. | Cognition
Uploaded: 2025-09-29T17:24:28.344Z
Duration: 56 s
Channel: Cognition
Description: We rebuilt Devin for Claude Sonnet 4.5. Available starting today as an Agent Preview that’s over 2x faster and 12% better on our Jr. Developer Evals.

Cognition

55,889 followers

1mo

We rebuilt Devin for Claude Sonnet 4.5. Available starting today as an Agent Preview that’s over 2x faster and 12% better on our Jr. Developer Evals.

1 Comment

Transcript

Today, we're making two major steps in Devin's autonomy. One, by introducing Sonnet 4.5 to Devin. And two, by introducing a new Devin harness built around long-term planning and the new model capabilities. I can message Devin from anywhere, so here I'm giving a big task from the comfort of Slack. And you can see here that it first comes up with an implementation plan that involves a high-level architecture and several phases of development. Devin's going to write thousands of lines of code, and importantly, it needs to test this code. Devin will even test the front end of its application, verifying that it works as expected. Devin can send screenshots of its code so you can check out from anywhere its progress. Devin can take multiple rounds of feedback, and here we can see many rounds of back and forth before Devin deploys one final application for the user. Today's new model introduces the biggest leap in Devin's autonomy we've seen since the launch of Sonnet 3.6 last year. We can't wait to see what you build.

Cognition

1mo

Want to learn more about what makes this model different? We've been testing Sonnet 4.5 extensively over the past few days and discovered some fascinating behaviors, from how it manages its own context window to how it creates feedback loops to verify its work. Read more below: https://cognition.ai/blog/devin-sonnet-4-5-lessons-and-challenges

1 Reaction

To view or add a comment, sign in

Transcript

Explore content categories