The ability to autoscale GPU workloads is critical to meet performance goals as well as optimize costs. This walkthrough shows one way to use the NVIDIA GPU Device Plugin add-on from OKE and common open-source telemetry tools to scale pods based on custom metrics relevant to AI/ML workloads. https://lnkd.in/eTiPiHX5
How to Autoscale GPU Workloads with NVIDIA Plugin
More Relevant Posts
-
Baseten used NVIDIA Dynamo to double inference speed for long-context code generation and increased throughput by 1.6x. Dynamo simplifies multi-node inference on Kubernetes, helping us scale deployments while reducing costs. Read the full blog ⏬ https://lnkd.in/e2_K33Y7
To view or add a comment, sign in
-
🧠 The AI Infrastructure Paradox: owning GPUs ≠ operationalizing AI. While enterprises rush to build GPU clusters, many are discovering that the real bottleneck lies in platform complexity: configuring networks, managing resources, and providing self-service access to developers. At AI Infrastructure Field Day, Rafay presented a different approach: ✅ Secure multi-tenancy for shared GPU clusters ✅ Standardization across vSphere, Kubernetes, and hybrid clouds ✅ Application-first delivery via catalog (e.g., Jupyter, inference endpoints) ✅ Governance and cost tracking baked in By providing the missing automation layer between raw hardware and AI services, Rafay aims to help enterprises bridge the 20x cost gap between owning GPUs and renting AI capacity. AI infrastructure isn’t just about compute; it’s about control, consistency, and consumption. #AIInfrastructure #Rafay #GPUs #HybridCloud #PlatformEngineering #AIEnablement #theCUBEResearch #EfficientlyConnected Read the fully analysis by Jack Poller here: https://lnkd.in/eJQ_m3AD
To view or add a comment, sign in
-
-
NextComputing just launched the Nucleus 1U with Ampere®—a compact, short-depth 1U rackmount server built for AI inference, cloud-native apps, and edge deployments! With up to 192 cores, NVIDIA GPU/DPU support, and up to 252TB NVMe storage, it’s a powerhouse for organizations needing high-density, energy-efficiency computing in space- and power-constrained environments. Perfect for: • AI inference at the edge • Cloud-native microservices • 5G/telco infrastructure • Arm developer workflows Learn more: https://lnkd.in/eD3AzuD9 Ampere #AI #EdgeComputing #CloudNative #NextComputing
To view or add a comment, sign in
-
-
📈 🆙 Enterprise AI infrastructure just got an upgrade. We’ve integrated NVIDIA RTX PRO 6000 ServerNVIDIAon with our BIG‑IP Next for Kubernetes to deliver enhanced performance, lower latency, and tighter security for AI workloads at scale. 👉 http://ms.spr.ly/6044tMpI6 via IT Tech Pulse
To view or add a comment, sign in
-
Grove is now part of NVIDIA Dynamo! Thrilled to share that Grove, a Kubernetes API for orchestrating modern #AI inference workloads, is now part of Dynamo as a modular, open-source component. As inference systems grow from single models to complex, multicomponent pipelines, scaling and coordination have become harder than ever. Grove makes it simple, defining your entire inference stack as one #Kubernetes resource that automatically handles scheduling, scaling, and topology-aware placement across thousands of GPUs. Now integrated with Dynamo, Grove brings a faster, more declarative way to run next-generation inference systems at scale. Explore the full story and step-by-step guide in our latest blog post. Link in comments below 👇
To view or add a comment, sign in
-
-
Coordinated scaling is extremely critical for getting performance when deploying a large-scale inference pipeline. To facilitate scalability, we integrated NVIDIA Grove into NVIDIA Dynamo! Learn more below!
Grove is now part of NVIDIA Dynamo! Thrilled to share that Grove, a Kubernetes API for orchestrating modern #AI inference workloads, is now part of Dynamo as a modular, open-source component. As inference systems grow from single models to complex, multicomponent pipelines, scaling and coordination have become harder than ever. Grove makes it simple, defining your entire inference stack as one #Kubernetes resource that automatically handles scheduling, scaling, and topology-aware placement across thousands of GPUs. Now integrated with Dynamo, Grove brings a faster, more declarative way to run next-generation inference systems at scale. Explore the full story and step-by-step guide in our latest blog post. Link in comments below 👇
To view or add a comment, sign in
-
-
Article about an addition to Dynamo, NVIDIA’s inference load balancer/optimizer. If you’re interested in deploying agents at scale, or even just want to understand the computational sequence of LLM executon across multiple GPU’s; Dynamo is worth studying.
Grove is now part of NVIDIA Dynamo! Thrilled to share that Grove, a Kubernetes API for orchestrating modern #AI inference workloads, is now part of Dynamo as a modular, open-source component. As inference systems grow from single models to complex, multicomponent pipelines, scaling and coordination have become harder than ever. Grove makes it simple, defining your entire inference stack as one #Kubernetes resource that automatically handles scheduling, scaling, and topology-aware placement across thousands of GPUs. Now integrated with Dynamo, Grove brings a faster, more declarative way to run next-generation inference systems at scale. Explore the full story and step-by-step guide in our latest blog post. Link in comments below 👇
To view or add a comment, sign in
-
-
Unleashing the beast of computing 🚀 NVIDIA's ComputeDomains are set to revolutionize the landscape of Kubernetes, tearing down the complex walls of multi-node GPU orchestration. 🟠 This is your ticket to dynamic, elastic GPU connectivity, with NVLink domains expanding and contracting like the universe itself. 🟠 No more wrestling with static configurations—ComputeDomains smartly manage these interconnections autonomously. 🟠 Forget about manual assignments; your workloads will sail smoothly in their own secure NVLink domains. 🟠 Whether you're after scalable GPU-to-GPU communication or optimal pod placement, bringing AI workloads to scale has never been more seamless. Elevate your Kubernetes experience and let your AI workloads soar. Do you think this will redefine how enterprises prioritize GPU architectures in their infrastructure strategies? 🤔 #Kubernetes #NVIDIA #GPUs #AIDevelopment #CloudComputing #TechTransformation 🔗https://lnkd.in/dTN2JznE 👉 Post of the day: https://lnkd.in/dACBEQnZ 👈
To view or add a comment, sign in
-
-
In collaboration with Rohan, Saurabh, Anish, and Amr E. from NVIDIA - Sertaç, Rita and I published a blog on how the open-source Dynamo project handles the demands of scalable AI inference in production AKS apps! It highlights how Dynamo’s distributed inference on AKS with GB200 NVL72 GPUs delivers high-throughput, multi-node serving. In an e-commerce example, we show personalized recommendations at scale and optimized GPU utilization to meet unpredictable traffic efficiently. Check out the full blog here: https://lnkd.in/gDpke4-8
To view or add a comment, sign in
-
NVIDIA's new Blackwell Ultra-based GB300 NVL72 platform has dominated the latest MLPerf AI training benchmarks, securing first position in all seven tests and significantly widening the performance gap with rivals. The system demonstrated its industry-leading capability by setting a record time of only 10 minutes to train the massive Llama 405B large language model, delivering up to five times the performance of its preceding Hopper-based platform, thanks in part to the adoption of FP4 precision for LLM training. #NVIDIA #TechGiants #ChipMaker #Blackwell #AITraining #MLPerf #GenerativeAI #LLM #DataCenter #DataCenters #Technology #TechnologyNews https://lnkd.in/dMc7SCu4
To view or add a comment, sign in