Luminal's Cloud Runs OpenAI's Open Source Model

This title was summarized by AI from the post below.

Luminal’s cloud can run OpenAI’s new opensource model. Our API was used to create the benchmark’s below:

View profile for Zecheng Zhang

Cofounder & CTO @ TraceRoot.AI (YC S25) | Founding Engineer @ Kumo.AI | Stanford CS MS

🤯 Which one is better? 𝗼𝟰-𝗺𝗶𝗻𝗶 vs. 𝗼𝗽𝗲𝗻𝗮𝗶/𝗴𝗽𝘁-𝗼𝘀𝘀-𝟭𝟮𝟬𝗯 on real-world debugging. We put the newly released 𝗼𝗽𝗲𝗻𝗮𝗶/𝗴𝗽𝘁-𝗼𝘀𝘀-𝟭𝟮𝟬𝗯 head-to-head against 𝗼𝟰-𝗺𝗶𝗻𝗶 in real-world debugging scenarios on the TraceRoot.AI (YC S25) platform. 𝗦𝗲𝘁𝘂𝗽: We tested each model on 10 selected real world bugs requiring: 1️⃣ Identifying the bug 2️⃣ Writing the code fix 3️⃣ Using GitHub tools to create a PR 𝗥𝗲𝘀𝘂𝗹𝘁𝘀: 📊 𝗼𝟰-𝗺𝗶𝗻𝗶: 50% success, 20% PR tool fails, 30% PRs didn’t fix the bug. 📊 𝗴𝗽𝘁-𝗼𝘀𝘀-𝟭𝟮𝟬𝗯: 20% success, 10% PR tool fails, 70% PRs didn’t fix the bug. 𝗜𝗻𝘀𝗶𝗴𝗵𝘁𝘀: 🔹 Tool usage performance is similar. 🔹 gpt-oss-120b hallucinates more and struggles with producing working fixes. Thanks to Groq and Luminal (YC S25) for serving 𝗴𝗽𝘁-𝗼𝘀𝘀-𝟭𝟮𝟬𝗯 and make it easy to benchmark! :) 💻 Join our open-source observability & debugging community for more details: GitHub: https://lnkd.in/gxws7-sN Discord: https://lnkd.in/gZnZVemq #AI #OpenSource #Debugging #Benchmarking #DeveloperTools #GPT #OpenAI #Groq #Luminal #gptoss #ycombinator

  • No alternative text description for this image

To view or add a comment, sign in

Explore content categories