Baseten boosts code generation with NVIDIA Dynamo

This title was summarized by AI from the post below.

17,103 followers

Baseten used NVIDIA Dynamo to double inference speed for long-context code generation and increased throughput by 1.6x. Dynamo simplifies multi-node inference on Kubernetes, helping us scale deployments while reducing costs. Read the full blog ⏬ https://lnkd.in/e2_K33Y7

AWS, Google, Microsoft and OCI Boost AI Inference Performance for Cloud Customers With NVIDIA Dynamo blogs.nvidia.com

To view or add a comment, sign in

More Relevant Posts

Oracle Developers

34,766 followers
2w
Report this post
The ability to autoscale GPU workloads is critical to meet performance goals as well as optimize costs. This walkthrough shows one way to use the NVIDIA GPU Device Plugin add-on from OKE and common open-source telemetry tools to scale pods based on custom metrics relevant to AI/ML workloads. https://lnkd.in/eTiPiHX5
Like Comment
To view or add a comment, sign in
Amr E.

Product Marketing @ NVIDIA
1w
Report this post
Grove is now part of NVIDIA Dynamo! Thrilled to share that Grove, a Kubernetes API for orchestrating modern #AI inference workloads, is now part of Dynamo as a modular, open-source component. As inference systems grow from single models to complex, multicomponent pipelines, scaling and coordination have become harder than ever. Grove makes it simple, defining your entire inference stack as one #Kubernetes resource that automatically handles scheduling, scaling, and topology-aware placement across thousands of GPUs. Now integrated with Dynamo, Grove brings a faster, more declarative way to run next-generation inference systems at scale. Explore the full story and step-by-step guide in our latest blog post. Link in comments below 👇
1 Comment
Like Comment
To view or add a comment, sign in
Vikram Sharma Mailthody
1w
Report this post
Coordinated scaling is extremely critical for getting performance when deploying a large-scale inference pipeline. To facilitate scalability, we integrated NVIDIA Grove into NVIDIA Dynamo! Learn more below!
Amr E.

Product Marketing @ NVIDIA
1w

Grove is now part of NVIDIA Dynamo! Thrilled to share that Grove, a Kubernetes API for orchestrating modern #AI inference workloads, is now part of Dynamo as a modular, open-source component. As inference systems grow from single models to complex, multicomponent pipelines, scaling and coordination have become harder than ever. Grove makes it simple, defining your entire inference stack as one #Kubernetes resource that automatically handles scheduling, scaling, and topology-aware placement across thousands of GPUs. Now integrated with Dynamo, Grove brings a faster, more declarative way to run next-generation inference systems at scale. Explore the full story and step-by-step guide in our latest blog post. Link in comments below 👇
Like Comment
To view or add a comment, sign in
Paul Nussbaum

Experienced Product Manager, Professor, Kaggle AI Expert, and NVIDIA AI Ambassador
1w
Report this post
Article about an addition to Dynamo, NVIDIA’s inference load balancer/optimizer. If you’re interested in deploying agents at scale, or even just want to understand the computational sequence of LLM executon across multiple GPU’s; Dynamo is worth studying.
Amr E.

Product Marketing @ NVIDIA
1w

Grove is now part of NVIDIA Dynamo! Thrilled to share that Grove, a Kubernetes API for orchestrating modern #AI inference workloads, is now part of Dynamo as a modular, open-source component. As inference systems grow from single models to complex, multicomponent pipelines, scaling and coordination have become harder than ever. Grove makes it simple, defining your entire inference stack as one #Kubernetes resource that automatically handles scheduling, scaling, and topology-aware placement across thousands of GPUs. Now integrated with Dynamo, Grove brings a faster, more declarative way to run next-generation inference systems at scale. Explore the full story and step-by-step guide in our latest blog post. Link in comments below 👇
Like Comment
To view or add a comment, sign in
Marc Dibeh

Senior Full Stack Developer - (DotNet/Angular)
1mo
Report this post
Unlock the power of accelerated computing 🌟 with Azure Container Instances supporting GPU workloads! 🚀 Azure Container Instances (ACI) with GPU support brings the perfect solution for high-performance computing and machine learning tasks. Imagine the ease of deploying containers with the horsepower of a GPU, all without managing complex infrastructure. 🎉 With ACI, you can seamlessly scale your GPU-accelerated tasks in containers, accessing NVIDIA GPUs to handle intensive computational workloads efficiently. Real-world uses are vast – from AI models that need rapid prototyping, to running simulations or visual processing at scale. Have you integrated Azure Container Instances with GPU Workloads in your projects yet? What was your experience and any challenges faced? 🤔 #AzureContainerInstances #GPUWorkloads #CloudComputing #HighPerformanceComputing #AzureTech
Like Comment
To view or add a comment, sign in
Wolfram Andreas Richter
3w
Report this post
Great news - Red Hat and NVIDIA formed an agreement to distribute the NVIDIA CUDA Toolkit across the Red Hat portfolio.

Red Hat to distribute NVIDIA CUDA across Red Hat AI, RHEL and OpenShift redhat.com

1 Comment
Like Comment
To view or add a comment, sign in
Harris Schneiderman

Cloud Operations Software Sales Director at Hewlett Packard Enterprise / Owner and CEO at H Tech Solutions
3w
Report this post
'Hewlett Packard Enterprise news from #Nvidia #GTC25 includes a new #PrivateCloud #AI developer kit, Nvidia AI blueprints, GPU optimization capabilities, and servers built with Nvidia Blackwell Ultra and Blackwell architecture.'

HPE, Nvidia broaden AI infrastructure lineup networkworld.com
Like Comment
To view or add a comment, sign in
vCluster

12,646 followers
1w
Report this post
Most teams optimize GPU autoscaling for hyperscalers. But what about neoclouds? Private GPU clouds? Bare metal? The result: idle GPUs, complex ops, and infrastructure that doesn't scale across environments. Join Lukas Gentele at Cloud Native + Kubernetes AI Day for his keynote on autoscaling GPU clusters anywhere, hyperscalers, neoclouds, and bare metal. 📅 Monday, Nov 10 | 10:25am EST 📍 Building B | Level 4 | B401-402 Learn how vCluster integrated Karpenter with Terraform/OpenTofu, ClusterAPI, KubeVirt, and NVIDIA BCM to bring dynamic autoscaling to any environment. Reduce idle GPU time. Simplify operations. Run consistent AI infrastructure everywhere. See you in Atlanta! 🚀 #KubeCon #CloudNativeCon #CNK8sAIDay
Like Comment
To view or add a comment, sign in
Rishi Anand

Solving e2e k8s stack at edge 🤞
3w Edited
Report this post
I recently deployed several LLMs — Llama 3.2 (1B & 3B) and Mistral (7B) — on the NVIDIA Jetson AGX Thor, and I was genuinely amazed by its performance. The response speed was incredibly smooth — almost comparable to what you’d expect from cloud-hosted models like GPT-5 or Sonet-4.5. If you’re planning to run LLMs on the AGX Thor, I highly recommend using Ollama. I initially spent over 10 hours trying to get TensorRT-LLM running on the device and kept hitting one issue after another. With Ollama, everything just worked seamlessly. My setup: Platform: NVIDIA AGX Thor JetPack: R38.2 (Aug 2025 release) Architecture: ARM64 (aarch64) CUDA: 13.0 Driver: 580.00 GPU: NVIDIA Thor (SM 11.0) If you’re deploying LLMs via Docker or Kubernetes, you can use the following CUDA-optimized image for Ollama: `https://lnkd.in/gdmzCsRF The AGX Thor is truly redefining edge AI — it’s impressive to see LLMs running this fluidly on a local device. -------- If you are looking for a managed solution to host on-prem GPU servers, Spectro Cloud's Palette has it covered for you. - https://lnkd.in/gKAVTUFt - https://lnkd.in/gvPQPD_D #nvidia #thor #llm #inference #onprem #gpu #cloud
2 Comments
Like Comment
To view or add a comment, sign in
AWS AI

450,255 followers
3w
Report this post
Learn how NVIDIA Run:ai's GPU optimization capabilities can be extended across #AWS hybrid & edge environments, from local zones to outposts racks & #EKS Hybrid Nodes 🌐🤖💪 https://go.aws/4oKE6j3 This solution's ability to support dynamic GPU fractions, node-level scheduling & priority-based sharing is valuable in edge scenarios where resource optimization & latency requirements are paramount. Read the blog to learn how to leverage these powerful features which have proven to improve GPU utilization from 25% to 75%.
2 Comments
Like Comment
To view or add a comment, sign in

17,103 followers

View Profile Follow

Baseten boosts code generation with NVIDIA Dynamo

More from this author

Deploying and using Stable Diffusion XL 1.0

Build a chatbot with Llama 2 and LangChain

Models We Love: July 2023

Explore content categories