Perplexity’s Post

1,290,971 followers

Perplexity is the first to develop custom Mixture-of-Experts (MoE) kernels that make trillion-parameter models available with cloud platform portability. Our team has published this work on arXiv as Perplexity's first research paper. Read more: https://lnkd.in/gStC_SzJ

Enabling Trillion-Parameter Models on AWS EFA research.perplexity.ai

19 Comments

AI Rank Checker

non-technical 3-sentence summary: Perplexity developed new technology that makes extremely large AI models (with trillions of parameters) run faster and more affordably in the cloud. This breakthrough allows these massive models to work across different servers, not just ultra-specialized hardware,making them much more accessible. In short, it means more powerful AI can be deployed in real products and services, not just research labs.

15 Reactions

Peter M.

Building @ Camera Search

Perplexity: Mad at Amazon Perplexity: Happy at Amazon

8 Reactions

MyongHak J.

Building I.N.G: A real-time video platform where your curiosity becomes income. Let’s connect. (Your curiosity deserves ROI.)| Real-time platform | Shortform | AI video tech

Impressive milestone, MoE kernels at this scale redefine efficiency itself. Portability across cloud platforms is exactly what pushes trillion-parameter AI from theory to real-world impact.

3 Reactions

Chris Brady

VP of Sales | Driving B2B Growth in Tech-Enabled Experiences | Hospitality • Fintech • Smart Energy • AI Retail | Faith Driven. Texas Built.

Perplexity- Keep crushing it!

3 Reactions

Arwed Grön

Gründer GROENIE.com | KI-Lösungen✨für Spa & Wellness | Chatbots & AI-Tools automatisieren Buchungen & Leads | Mit Aloha🌺 Effizienz boosten & Mind stärken | Auch für andere Branchen

Aloha🌺from Germany. I love to work with your Assistent in Comet 😍. It´s simply amazing!!! So helpful!!!

2 Reactions

Sabri Deniz Martin

Head of Marketing @ TestSolutions GmbH | Testmanager | "Deniz"

So basically... Perplexity has achieved a significant technical advance that enables running the largest AI models efficiently and with lower latency on the AWS Cloud infrastructure, right? Will this be only used internally?

2 Reactions

Orest Andrusyshyn

Once a kid with a dream. Still here, building software that feels human in a world that forgot how.

Wow, custom MoE kernels at this scale! Honestly curious how this will play out across different clouds. Been helping teams automate similar workflows lately, everyone seems to hit a different bottleneck first.

2 Reactions

Zhirayr Gumruyan

CEO & Co-Founder, Elixion.ai — Human + AI collaboration redefined

Perplexity is doing great job!

2 Reactions

The Agentic Learning

Impressive milestone. Making trillion-parameter models portable across cloud platforms is a huge step toward scalable AI accessibility. Excited to see how this shapes the future of open research.

4 Reactions

See more comments

To view or add a comment, sign in

More Relevant Posts

Gerrie Brand

Empowering businesses & communities to grow with AI through practical insights, structured strategy, and support that bridges the gap between technical and non-technical audiences.
2w
Report this post
Building trillion-parameter models is impressive. Making them useful is leadership. Amazon’s latest work on scaling AI infrastructure (via Neuron SDK + AWS EFA) makes it possible to train and deploy trillion-parameter models on the cloud. This isn’t about showing off compute power. It’s about reducing friction between innovation and implementation. And that’s where most companies still get stuck. “Trillion-parameter models aren’t the goal. Business value is. Use scale when it solves a real problem faster, cheaper, or better.” If you work in AI strategy, here’s what this signals: • The hardware excuse is gone. Public cloud infrastructure is now enterprise-ready for LLMs at any scale. • ROI matters more than ever. Bigger models mean higher costs. Use them where they deliver real advantage—speed, quality, or reach. • Start with fit, not size. Most use-cases don’t need trillion-parameter models. Start with lean, proven architectures. Scale later. • Governance is non-negotiable. As model size grows, so does risk. Monitoring, bias detection, and control must grow with it. • Think in stages. Don’t launch into trillion-parameter territory blind. Move in phases. Measure at each step. This kind of infrastructure shift changes what’s possible. The real question is whether your team is ready to make it valuable. Let’s stop talking about size. Let’s talk about use.

Perplexity

1,290,971 followers
2w

Perplexity is the first to develop custom Mixture-of-Experts (MoE) kernels that make trillion-parameter models available with cloud platform portability. Our team has published this work on arXiv as Perplexity's first research paper. Read more: https://lnkd.in/gStC_SzJ

Enabling Trillion-Parameter Models on AWS EFA research.perplexity.ai
Like Comment
To view or add a comment, sign in
Abdulaziz A.

Building High-Impact Alliances to Drive Revenue & Market in ESG investment strategies and AI
2w
Report this post
This work is a testament to the fact that the next frontier of #LLM performance and scalability lies as much in systems engineering and networking as it does in model architecture. By open sourcing these kernels, you are providing a crucial piece of infrastructure for the community, lowering the barrier to deploying trillion parameter models in a cost-effective and performant manner. This is a substantial contribution that will likely influence how future large scale inference systems are built. Excellent work from the Perplexity team. I look forward to trying the kernels and reading the full paper

Perplexity

1,290,971 followers
2w

Perplexity is the first to develop custom Mixture-of-Experts (MoE) kernels that make trillion-parameter models available with cloud platform portability. Our team has published this work on arXiv as Perplexity's first research paper. Read more: https://lnkd.in/gStC_SzJ

Enabling Trillion-Parameter Models on AWS EFA research.perplexity.ai
Like Comment
To view or add a comment, sign in
Yang Zhou

ML systems and networking
2w Edited
Report this post
Our UCCL-ep supported MoE kernels on AWS EFA since last month: https://lnkd.in/dpJq7vFB. It also supports AMD GPUs and Broadcom NICs: https://lnkd.in/edZidD9R Moreover, UCCL-ep has supported SGlang SGLang (without any line of code change) to run deepseekv3 on AWS H200 VMs. We are also integrating into vllm vLLM now. Will release soon! So, UCCL-EP compared to pplx (1) UCCL-EP runs on AMD and Broadcom, beyond EFA (2) UCCL-EP has better perf with larger # tokens (e.g. 4096), e.g. 2.1ms for dispatch, 4.9ms for combine at EP32, while pplx has 4.7ms and 8.3ms respectively (3) API-compatible with DeepEP, no code changes needed. [My previous comments under this post got mysteriously disappeared. So I reposted here. But who deleted it? 🤔]

Perplexity

1,290,971 followers
2w

Perplexity is the first to develop custom Mixture-of-Experts (MoE) kernels that make trillion-parameter models available with cloud platform portability. Our team has published this work on arXiv as Perplexity's first research paper. Read more: https://lnkd.in/gStC_SzJ

Enabling Trillion-Parameter Models on AWS EFA research.perplexity.ai

1 Comment
Like Comment
To view or add a comment, sign in
Kyle Koch

Customer Success & Services Executive | Scaled Teams from 0→200+ | $40M+ P&L | GTM-Aligned Leader for Growth-Stage Tech
2w Edited
Report this post
Everyone will get the model, but not everyone will get the renewal. Perplexity just showed how to run trillion-parameter models on standard cloud infrastructure. The bigger story is what it signals: model access is normalizing. The most important question for AI-first orgs to ask themselves isn't if they can build a smarter model, but if they can repeatably convert capability into customer value. • A workflow that eliminates effort • A result the customer can measure • Something customers will pay to keep Renewal conversations will tell the answer. https://lnkd.in/geSxh9fT

Enabling Trillion-Parameter Models on AWS EFA research.perplexity.ai
Like Comment
To view or add a comment, sign in
Jeremy Garcia
2w
Report this post
Datadog just published a new research report! Cloud providers offer an increasing breadth of modern compute services, from serverless to containers to GPUs and beyond. For this report, we examined adoption trends across compute options, autoscaling practices, and approaches to efficiency and optimization. Our findings suggest that most organizations rely on an evolving mix of modern compute technologies, shaped by growing focus on cost optimization, shifting preference of autoscaling tools, and expanding Arm usage. Did you know Karpenter adoption has overtaken Cluster Autoscaler? Find out more in the new "State of Containers and Serverless" report. https://lnkd.in/eyPNGWKn

State of Containers and Serverless | Datadog datadoghq.com

4 Comments
Like Comment
To view or add a comment, sign in
Gady Rosenfeld

Vice President, Networking Business at NVIDIA
3w
Report this post
Yesterday at GTC, Jensen unveiled BlueField-4 - the processor running the operating system of AI Factories. Here's how Oracle Cloud is taking advantage of it: https://lnkd.in/gZBrr9T9

Powering the Next Wave of AI with Oracle Acceleron and NVIDIA BlueField-4 blogs.oracle.com

2 Comments
Like Comment
To view or add a comment, sign in
James Eastham

Talking Serverless @ Datadog | International Speaker | Microsoft MVP | AWS Community Builder |
2w
Report this post
Datadog has just launched the 2025 State of Containers and Serverless report. It's been very interesting to be a small part of putting this together. We've aggregated, analyzed and pulled together data to look at exactly how people are running their applications. From the underlying infrastructure to the managed services provided by cloud providers. My personal favourite fact, 66% of organizations using functions are also running containers. It's re-affirmed a theme I've repeated all year, the technologies are converging. It's no longer about being serverless or not serverless. It's about choosing the right technology for your workload. https://lnkd.in/eGgpR386 #serverless #containers #cloud #architecture #technology

State of Containers and Serverless | Datadog datadoghq.com

6 Comments
Like Comment
To view or add a comment, sign in
Marcin Sodkiewicz

Principal Software Engineer @ Ryanair - Europe's Favourite Airline | AWS Serverless Hero | AWS User Group Wrocław Community Leader
2w
Report this post
Fact 3: Most workloads use less than half of their requested resources. "Most" in this case means that almost 80% of lambdas are heavily overprovisioned (using <50% of assigned memory and CPU). The same applies to solutions compared across all platforms: Azure Container Apps, Google Cloud Run Functions, Fargate, and k8s containers. Will such measurements get better over time in the report? Is this representative for serverless functions? There is no option for dynamic function sizing, and lambdas can consume items with varying batch sizes and inputs. In those cases, it is better to overprovision than fail. I wonder if the methodology of measuring that shouldn't be different. Again, not a shocker, but it can be an action point to review your workloads!

James Eastham

Talking Serverless @ Datadog | International Speaker | Microsoft MVP | AWS Community Builder |
2w

Datadog has just launched the 2025 State of Containers and Serverless report. It's been very interesting to be a small part of putting this together. We've aggregated, analyzed and pulled together data to look at exactly how people are running their applications. From the underlying infrastructure to the managed services provided by cloud providers. My personal favourite fact, 66% of organizations using functions are also running containers. It's re-affirmed a theme I've repeated all year, the technologies are converging. It's no longer about being serverless or not serverless. It's about choosing the right technology for your workload. https://lnkd.in/eGgpR386 #serverless #containers #cloud #architecture #technology

State of Containers and Serverless | Datadog datadoghq.com

1 Comment
Like Comment
To view or add a comment, sign in
Daniel Murphy

Cloud Solutions Business Development (UK) @ Oracle | Driving Cloud Solutions Growth
1mo
Report this post
Oracle announced Oracle Cloud Infrastructure (OCI) Zettascale10, the largest AI supercomputer in the cloud. FInd out more in Data Center Dynamics: https://lnkd.in/eyvCSRrj

Oracle unveils Zettascale10 AI supercomputer, claims it will be largest in the cloud datacenterdynamics.com
Like Comment
To view or add a comment, sign in
Ayman Saied

Oracle Presales Team Leader Master Principal Sales Consultant IC5 at Oracle | TOGAF®9.2
1mo
Report this post
Oracle announced Oracle Cloud Infrastructure (OCI) Zettascale10, the largest AI supercomputer in the cloud. FInd out more in Data Center Dynamics: https://lnkd.in/djdik5Ab

Oracle unveils Zettascale10 AI supercomputer, claims it will be largest in the cloud datacenterdynamics.com
Like Comment
To view or add a comment, sign in

1,290,971 followers

View Profile Connect

Perplexity’s Post

More from this author

Agents or Bots? Making Sense of AI on the Open Web

Copilot on Perplexity: Faster, More Efficient, and Powered by OpenAI's Fine-Tuned GPT-3.5

Explore content categories