Tuhin Srivastava sits down on the Gradient Dissent podcast by Weights & Biases
They discuss all things inference and what sets Baseten apart:
> When to consider closed-source vs. open-source models
> Inference vs. runtime optimizations
> The importance of the developer experience
> Building a high-velocity product org
Links to the full episode are in the comments. Thanks for having us!
In a world where that we're shifting towards models doing everything and all the value being at the application layer, if there is AGI, the only market that will exist to some extent is what does the model need to do? And the model can only run on inference. And so there's two different parts of inference. So I think on the infrastructure level, I have a workload running across 510, a 100,000 GP years. How is this thing going to scale? The second, which is the runtime level problems, Hey, how fast do these models actually run on a given GPU? Who's using open source and who's using closest? Source in your experience and how do people think about that trade off? I think like everyone's on this curve of like maturity and I think a lot of people start from customer to open source models when they're doing their own things. I think the lower places other people start is that you go with entropy, go with open air, and you have these great models that can do a lot, but they're either too expensive, a lot of reliability issues themselves, or the third piece, which is just like our customers care that we aren't just piping all this data off to someone who trained models on this and that matters to us.
YouTube: https://www.youtube.com/watch?v=QJUsxm1Nmos Apple: https://podcasts.apple.com/us/podcast/the-ceo-behind-the-fastest-growing-ai-inference/id1504567418?i=1000737245719 Spotify: https://open.spotify.com/episode/1heuqCYUlHUyer6XagBFVX?si=vcD2T6c7T7CI8cg5JUPxAQ&nd=1&dlsi=787af97228c5411c