OpenAI released GPT‑5.1 to the API with adaptive reasoning: fewer tokens on simple tasks, deeper thinking on complex ones. It adds 24‑hour prompt caching, faster responses for Priority Processing, and a switch to disable reasoning via reasoning_effort=‘none’. Coding quality is improved with partner input. New tools: apply_patch for structured diffs and a shell interface. Pricing matches GPT‑5, plus gpt‑5.1‑codex variants for agentic coding.
OpenAI releases GPT-5.1 with adaptive reasoning, new tools and pricing
More Relevant Posts
-
🚨 ➜ GPT-5.1 is now available in the API ➜ Pricing remains the same as GPT-5 ➜ New coding models released: gpt-5.1-codex gpt-5.1-codex-mini (optimized for long, complex coding tasks) ➜ Prompt caching extended to 24 hours for faster, cheaper repeated calls
To view or add a comment, sign in
-
Get 96%+ of GPT-5-Codex programming performance at 1/5 the cost. OpenAI's new GPT-5-Codex-Mini offers developers a high-value coding option. 1️⃣ Performance: Scores 71.3% on SWE-bench, only a 3.2% gap from the full version (74.5%). 2️⃣ Cost: Significant savings (Input: $1.50/1M tokens). Complete similar tasks at ~1/5 the cost of the full version. 3️⃣ Smart: Suggests switching to Mini at 90% API quota usage to prevent interruptions. 4️⃣ Use Cases: Ideal for low-to-medium complexity tasks, code completion, CLI, and IDE extensions. Enable in Codex CLI: codex --model gpt-5-codex-mini Or set as default in config.toml. API access is "coming soon," but it's available now in the CLI and VS Code extension. Pricing: https://lnkd.in/gaSk3ADc Changelog: https://lnkd.in/gfuZEBkM #GPT5 #Codex #OpenAI #AICoding #Productivity
To view or add a comment, sign in
-
-
GLM-4.6 is now available on Clarifai! 🚀 GLM-4.6 unifies reasoning, coding, and agentic capabilities into a single model. It comes with key upgrades that make it more capable across advanced reasoning, code generation, and multi-turn agent tasks. Key Highlights: • 200K token context window (up from 128K) • Stronger coding performance on benchmarks and real-world tools like Claude Code, Cline, and Roo Code • Improved reasoning with support for tool use during inference • More capable agent behavior in tool-using and search-based systems • Refined writing quality aligned with human style and readability Benchmarks across eight public datasets show consistent gains over GLM-4.5, with GLM-4.6 performing competitively against models like DeepSeek-V3.1-Terminus and Claude Sonnet 4. Try it now in the Playground, or access it through the API using Clarifai’s OpenAI-compatible endpoint. 👉 Try GLM-4.6 here: https://lnkd.in/eqksMNeZ
To view or add a comment, sign in
-
-
Three new OpenAI models are now available in Cursor: 1. GPT-5.1: For everyday tasks like planning and debugging 2. GPT-5.1 Codex: For ambitious coding tasks 3. GPT-5.1 Codex Mini: For cost-efficient changes
To view or add a comment, sign in
-
Exploring Small Language Models (SLMs) by enabling the ability to easily switch back and forth between SLMs and LLMs. My gut feel is that LLMs will implode due to their complexity and cost, and SLMs will win out in the long run. But for now it is important to study the quality gap... of course, with distillation of LLMs to SLMs that quality gap is shrinking. See: https://lnkd.in/ekD4rX3k
To view or add a comment, sign in
-
Magic recipe for one shotting features in Cursor 1️⃣ Ask Cursor to write a RFC for the feature you want to build. Example prompt: “Implement a folder repository integrated with the agent chat, follow the CRUD structure from @example.ts, research the codebase for existent patterns you can reuse and write a RFC in a markdown file.” RFC is a common format vastly used to both present a design and request for comments. Great for the initial interaction. Happy with the RFC? Move to the next step. 2️⃣ Switch to Plan mode in the Chat Composer, ask it to create the plan. Example prompt: “Based on the RFC we’ve discussed, create the plan” Optional addition: “Plan just the first X parts of the RFC” -> Useful for larger features. Iterate in the plan until you’re confident the agent can one shot your code. 3️⃣ Hit Built and review the code. Also trigger the Cursor review while you analyze the diff, for an extra layer of quality assurance. 🎬 Final thoughts - Use reasoning models for step 1 and 2: GPT 5 or Claude Sonnet 4.5 with thinking. - Use a fast model for the build. Composer-1 of the regular Claude Sonnet 4.5. Keep in mind that RFCs can be one to many plans and build. That’s why is a good idea to store the RFC in a markdown. Then you can easily spawn new agents to plan -> build differents part of the feature.
To view or add a comment, sign in
-
Minimax M2 is the Neo of the LLM matrix right now... let me explain Why exactly does M2 perform so well on agentic tasks? M2 essentially re-innovated the idea of "more is better" in the world of large-scale AI. More data, more parameters, more compute sounds great, but when it comes time to autonomously planning, acting and verifying tasks, You have to ask, how many iterations is it going to take to fix the light bulb? It's like having a squad of the best and brightest scientists assembled just to come and fix your hot water. I'm sure they'll do a great job, but you bet there's going to be constant debate and reiteration along the way. When all you really needed was that one skilled tradesman, with a vast knowledge of everything, and keen ability to hone in on the task at hand. It's like Neo in the Matrix, only downloading what he needs to get the job done. The common agent workflow is to plan → act → verify. And M2's sparse activation architecture is optimized exactly for this flow. Sparse activation is like Neo: it doesn’t load the whole Matrix — just what’s needed. “I know Python.” “I know how to debug.” “I know how to ship in production.” The 230 billion parameter model uses a low activation of 10 billion expert parameters to carry out the task, Meaning high responsiveness per component in the workflow and reduced compute overhead for each step. And the ingenuity is reflected in the output. MiniMax-M2 scored an impressive 77.2 on the Tau Bench (benchmark for agentic tool use). This significantly outperforms gpt-oss-120b, which scored only 67.8% on that same benchmark. Would love to hear from anyone who has started implementing M2 in their agentic workflow — what are your thoughts?
To view or add a comment, sign in
-
OpenAI just raised the limits for Codex and released GPT-5-Codex-Mini. The new model trails the full GPT-5-Codex by only 3 percentage points on the SWE-bench Verified benchmark but it’s 4× more cost-efficient. OpenAI recommends using it for lighter coding tasks to save requests to the main model. And once you hit 90% of your usage limit, Codex will politely suggest switching to the Mini version. On top of that, ChatGPT Plus, Business, and Edu users are getting 50% higher limits. Still not Anthropic-level generosity but it’s a solid upgrade.
To view or add a comment, sign in
-
-
Went through a MarkTechPost and created the document. Source : https://lnkd.in/d4wGKSpK Code-oriented large language models moved from autocomplete to software engineering systems. In 2025, leading models must fix real GitHub issues, refactor multi-repo backends, write tests, and run as agents over long context windows. The main question for teams is not “can it code” but which model fits which constraints. Here are seven models (and systems around them) that cover most real coding workloads today: 1. OpenAI : GPT-5 / GPT-5-Codex 2. Anthropic : Claude 3.5 Sonnet / Claude 4.x Sonnet with Claude Code 3. Google : Gemini 2.5 Pro 4. Meta : Llama 3.1 405B Instruct 5. DeepSeek-V2.5-1210 (with DeepSeek-V3 as the successor) 6. Alibaba : Qwen2.5-Coder-32B-Instruct 7. Mistral : Codestral 25.01 The goal of this comparison is not to rank them on a single score. The goal is to show which system to pick for a given benchmark target, deployment model, governance requirement, and IDE or agent stack.
To view or add a comment, sign in
-
MCP in Action — One of the Core Pieces of Agentic AI Workflows A lightweight Model Context Protocol (MCP) setup can act as a clean interface between your LLM and your internal systems — allowing the model to execute real actions securely and locally. To demonstrate this, I implemented a local leave management interface where Claude Desktop (MCP client) communicates directly with a Python-based MCP server using mcp. 🧭 Implementation Overview Built a minimal MCP server in Python with FastMCP. Exposed structured tools for: 🧾 Checking employee leave balance 📝 Applying leave for specific dates 🕒 Viewing leave history 💬 Generating personalized greetings Integrated the MCP server with Claude Desktop as the client, enabling natural language interaction with a local employee database. ⚡ Why MCP Works Well Here No REST APIs or front-end layers required Full local execution — no data leaves the machine Natural language acts as the interface layer Extensible structure that can scale to more complex internal workflows With this setup, I can query and modify my local DB using simple instructions like: “How many leave days does E001 have left?” “Apply leave for E002 on May 1st.” “Show me leave history for E001.” No UI. No external services. Just a structured interface between the model and the system. Agentic AI isn’t just about smarter models — it’s about giving them the right interfaces to act. #MCP #AgenticAI #Claude #Python #LLM #DeveloperTools #Automation #AIEngineering #Innovation #LLMOps
To view or add a comment, sign in