Luminal’s cloud can run OpenAI’s new opensource model. Our API was used to create the benchmark’s below:
🤯 Which one is better? 𝗼𝟰-𝗺𝗶𝗻𝗶 vs. 𝗼𝗽𝗲𝗻𝗮𝗶/𝗴𝗽𝘁-𝗼𝘀𝘀-𝟭𝟮𝟬𝗯 on real-world debugging. We put the newly released 𝗼𝗽𝗲𝗻𝗮𝗶/𝗴𝗽𝘁-𝗼𝘀𝘀-𝟭𝟮𝟬𝗯 head-to-head against 𝗼𝟰-𝗺𝗶𝗻𝗶 in real-world debugging scenarios on the TraceRoot.AI (YC S25) platform. 𝗦𝗲𝘁𝘂𝗽: We tested each model on 10 selected real world bugs requiring: 1️⃣ Identifying the bug 2️⃣ Writing the code fix 3️⃣ Using GitHub tools to create a PR 𝗥𝗲𝘀𝘂𝗹𝘁𝘀: 📊 𝗼𝟰-𝗺𝗶𝗻𝗶: 50% success, 20% PR tool fails, 30% PRs didn’t fix the bug. 📊 𝗴𝗽𝘁-𝗼𝘀𝘀-𝟭𝟮𝟬𝗯: 20% success, 10% PR tool fails, 70% PRs didn’t fix the bug. 𝗜𝗻𝘀𝗶𝗴𝗵𝘁𝘀: 🔹 Tool usage performance is similar. 🔹 gpt-oss-120b hallucinates more and struggles with producing working fixes. Thanks to Groq and Luminal (YC S25) for serving 𝗴𝗽𝘁-𝗼𝘀𝘀-𝟭𝟮𝟬𝗯 and make it easy to benchmark! :) 💻 Join our open-source observability & debugging community for more details: GitHub: https://lnkd.in/gxws7-sN Discord: https://lnkd.in/gZnZVemq #AI #OpenSource #Debugging #Benchmarking #DeveloperTools #GPT #OpenAI #Groq #Luminal #gptoss #ycombinator