Today, we're making two major steps in Devin's autonomy. One, by introducing Sonnet 4.5 to Devin. And two, by introducing a new Devin harness built around long-term planning and the new model capabilities. I can message Devin from anywhere, so here I'm giving a big task from the comfort of Slack. And you can see here that it first comes up with an implementation plan that involves a high-level architecture and several phases of development. Devin's going to write thousands of lines of code, and importantly, it needs to test this code. Devin will even test the front end of its application, verifying that it works as expected. Devin can send screenshots of its code so you can check out from anywhere its progress. Devin can take multiple rounds of feedback, and here we can see many rounds of back and forth before Devin deploys one final application for the user. Today's new model introduces the biggest leap in Devin's autonomy we've seen since the launch of Sonnet 3.6 last year. We can't wait to see what you build.
Want to learn more about what makes this model different?
We've been testing Sonnet 4.5 extensively over the past few days and discovered some fascinating behaviors, from how it manages its own context window to how it creates feedback loops to verify its work.
Read more below:
https://cognition.ai/blog/devin-sonnet-4-5-lessons-and-challenges
Want to learn more about what makes this model different? We've been testing Sonnet 4.5 extensively over the past few days and discovered some fascinating behaviors, from how it manages its own context window to how it creates feedback loops to verify its work. Read more below: https://cognition.ai/blog/devin-sonnet-4-5-lessons-and-challenges