How to Autoscale GPU Workloads with NVIDIA Plugin

This title was summarized by AI from the post below.

View organization page for Oracle Developers

34,771 followers

The ability to autoscale GPU workloads is critical to meet performance goals as well as optimize costs. This walkthrough shows one way to use the NVIDIA GPU Device Plugin add-on from OKE and common open-source telemetry tools to scale pods based on custom metrics relevant to AI/ML workloads. https://lnkd.in/eTiPiHX5

To view or add a comment, sign in

More Relevant Posts

Baseten

17,107 followers
1w
Report this post
Baseten used NVIDIA Dynamo to double inference speed for long-context code generation and increased throughput by 1.6x. Dynamo simplifies multi-node inference on Kubernetes, helping us scale deployments while reducing costs. Read the full blog ⏬ https://lnkd.in/e2_K33Y7

AWS, Google, Microsoft and OCI Boost AI Inference Performance for Cloud Customers With NVIDIA Dynamo blogs.nvidia.com
Like Comment
To view or add a comment, sign in
Efficiently Connected, Inc.

451 followers
1mo
Report this post
🧠 The AI Infrastructure Paradox: owning GPUs ≠ operationalizing AI. While enterprises rush to build GPU clusters, many are discovering that the real bottleneck lies in platform complexity: configuring networks, managing resources, and providing self-service access to developers. At AI Infrastructure Field Day, Rafay presented a different approach: ✅ Secure multi-tenancy for shared GPU clusters ✅ Standardization across vSphere, Kubernetes, and hybrid clouds ✅ Application-first delivery via catalog (e.g., Jupyter, inference endpoints) ✅ Governance and cost tracking baked in By providing the missing automation layer between raw hardware and AI services, Rafay aims to help enterprises bridge the 20x cost gap between owning GPUs and renting AI capacity. AI infrastructure isn’t just about compute; it’s about control, consistency, and consumption. #AIInfrastructure #Rafay #GPUs #HybridCloud #PlatformEngineering #AIEnablement #theCUBEResearch #EfficientlyConnected Read the fully analysis by Jack Poller here: https://lnkd.in/eJQ_m3AD
1 Comment
Like Comment
To view or add a comment, sign in
NextComputing

576 followers
1mo Edited
Report this post
NextComputing just launched the Nucleus 1U with Ampere®—a compact, short-depth 1U rackmount server built for AI inference, cloud-native apps, and edge deployments! With up to 192 cores, NVIDIA GPU/DPU support, and up to 252TB NVMe storage, it’s a powerhouse for organizations needing high-density, energy-efficiency computing in space- and power-constrained environments. Perfect for: • AI inference at the edge • Cloud-native microservices • 5G/telco infrastructure • Arm developer workflows Learn more: https://lnkd.in/eD3AzuD9 Ampere #AI #EdgeComputing #CloudNative #NextComputing
1 Comment
Like Comment
To view or add a comment, sign in
Svitlana Duzenko

Business Development at F5 | NGINX | Distributed Cloud - Application Delivery and Multi-Cloud Security
1w
Report this post
📈 🆙 Enterprise AI infrastructure just got an upgrade. We’ve integrated NVIDIA RTX PRO 6000 ServerNVIDIAon with our BIG‑IP Next for Kubernetes to deliver enhanced performance, lower latency, and tighter security for AI workloads at scale. 👉 http://ms.spr.ly/6044tMpI6 via IT Tech Pulse
Like Comment
To view or add a comment, sign in
Amr E.

Product Marketing @ NVIDIA
1w
Report this post
Grove is now part of NVIDIA Dynamo! Thrilled to share that Grove, a Kubernetes API for orchestrating modern #AI inference workloads, is now part of Dynamo as a modular, open-source component. As inference systems grow from single models to complex, multicomponent pipelines, scaling and coordination have become harder than ever. Grove makes it simple, defining your entire inference stack as one #Kubernetes resource that automatically handles scheduling, scaling, and topology-aware placement across thousands of GPUs. Now integrated with Dynamo, Grove brings a faster, more declarative way to run next-generation inference systems at scale. Explore the full story and step-by-step guide in our latest blog post. Link in comments below 👇
1 Comment
Like Comment
To view or add a comment, sign in
Vikram Sharma Mailthody
1w
Report this post
Coordinated scaling is extremely critical for getting performance when deploying a large-scale inference pipeline. To facilitate scalability, we integrated NVIDIA Grove into NVIDIA Dynamo! Learn more below!
Amr E.

Product Marketing @ NVIDIA
1w

Grove is now part of NVIDIA Dynamo! Thrilled to share that Grove, a Kubernetes API for orchestrating modern #AI inference workloads, is now part of Dynamo as a modular, open-source component. As inference systems grow from single models to complex, multicomponent pipelines, scaling and coordination have become harder than ever. Grove makes it simple, defining your entire inference stack as one #Kubernetes resource that automatically handles scheduling, scaling, and topology-aware placement across thousands of GPUs. Now integrated with Dynamo, Grove brings a faster, more declarative way to run next-generation inference systems at scale. Explore the full story and step-by-step guide in our latest blog post. Link in comments below 👇
Like Comment
To view or add a comment, sign in
Paul Nussbaum

Experienced Product Manager, Professor, Kaggle AI Expert, and NVIDIA AI Ambassador
1w
Report this post
Article about an addition to Dynamo, NVIDIA’s inference load balancer/optimizer. If you’re interested in deploying agents at scale, or even just want to understand the computational sequence of LLM executon across multiple GPU’s; Dynamo is worth studying.
Amr E.

Product Marketing @ NVIDIA
1w

Grove is now part of NVIDIA Dynamo! Thrilled to share that Grove, a Kubernetes API for orchestrating modern #AI inference workloads, is now part of Dynamo as a modular, open-source component. As inference systems grow from single models to complex, multicomponent pipelines, scaling and coordination have become harder than ever. Grove makes it simple, defining your entire inference stack as one #Kubernetes resource that automatically handles scheduling, scaling, and topology-aware placement across thousands of GPUs. Now integrated with Dynamo, Grove brings a faster, more declarative way to run next-generation inference systems at scale. Explore the full story and step-by-step guide in our latest blog post. Link in comments below 👇
Like Comment
To view or add a comment, sign in
LynxOps.AI

228 followers
1w
Report this post
Unleashing the beast of computing 🚀 NVIDIA's ComputeDomains are set to revolutionize the landscape of Kubernetes, tearing down the complex walls of multi-node GPU orchestration. 🟠 This is your ticket to dynamic, elastic GPU connectivity, with NVLink domains expanding and contracting like the universe itself. 🟠 No more wrestling with static configurations—ComputeDomains smartly manage these interconnections autonomously. 🟠 Forget about manual assignments; your workloads will sail smoothly in their own secure NVLink domains. 🟠 Whether you're after scalable GPU-to-GPU communication or optimal pod placement, bringing AI workloads to scale has never been more seamless. Elevate your Kubernetes experience and let your AI workloads soar. Do you think this will redefine how enterprises prioritize GPU architectures in their infrastructure strategies? 🤔 #Kubernetes #NVIDIA #GPUs #AIDevelopment #CloudComputing #TechTransformation 🔗https://lnkd.in/dTN2JznE 👉 Post of the day: https://lnkd.in/dACBEQnZ 👈
Like Comment
To view or add a comment, sign in
Sachi Desai

Product @ Microsoft Azure
3w
Report this post
In collaboration with Rohan, Saurabh, Anish, and Amr E. from NVIDIA - Sertaç, Rita and I published a blog on how the open-source Dynamo project handles the demands of scalable AI inference in production AKS apps! It highlights how Dynamo’s distributed inference on AKS with GB200 NVL72 GPUs delivers high-throughput, multi-node serving. In an e-commerce example, we show personalized recommendations at scale and optimized GPU utilization to meet unpredictable traffic efficiently. Check out the full blog here: https://lnkd.in/gDpke4-8

Scaling multi-node LLM inference with NVIDIA Dynamo and ND GB200 NVL72 GPUs on AKS | AKS Engineering Blog blog.aks.azure.com

2 Comments
Like Comment
To view or add a comment, sign in
Theodore Aggelopoulos, MBA

Senior ICT Product & Programme manager with expertise in devices and services
1w
Report this post
NVIDIA's new Blackwell Ultra-based GB300 NVL72 platform has dominated the latest MLPerf AI training benchmarks, securing first position in all seven tests and significantly widening the performance gap with rivals. The system demonstrated its industry-leading capability by setting a record time of only 10 minutes to train the massive Llama 405B large language model, delivering up to five times the performance of its preceding Hopper-based platform, thanks in part to the adoption of FP4 precision for LLM training. #NVIDIA #TechGiants #ChipMaker #Blackwell #AITraining #MLPerf #GenerativeAI #LLM #DataCenter #DataCenters #Technology #TechnologyNews https://lnkd.in/dMc7SCu4

NVIDIA Blackwell Ultra Secures Win Across All Seven MLPerf AI Training Benchmarks, GB200 NVL72 Sets Record 10 Minutes Training Time For Llama 405B wccftech.com
Like Comment
To view or add a comment, sign in

34,771 followers

View Profile Connect

How to Autoscale GPU Workloads with NVIDIA Plugin

More Relevant Posts

Explore content categories