Language Model

An Implementation of Fully Traced and Evaluated Local LLM Pipeline Using...

0
In this tutorial, we implement a complete workflow for building, tracing, and evaluating an LLM pipeline using Opik. We structure the system step-by-step, beginning...

Allen Institute for AI (AI2) Introduces Olmo 3: An Open Source...

0
Allen Institute for AI (AI2) is releasing Olmo 3 as a fully open model family that exposes the entire 'model flow', from raw data...

vLLM vs TensorRT-LLM vs HF TGI vs LMDeploy, A Deep Technical...

0
Production LLM serving is now a systems problem, not a generate() loop. For real workloads, the choice of inference stack drives your tokens per...

xAI’s Grok 4.1 Pushes Toward Higher Emotional Intelligence, Lower Hallucinations and...

0
How do you build an AI assistant that feels emotionally intelligent and reliable to humans, instead of just making a bigger model? Meet Grok...

Google’s Gemini 3 Pro turns sparse MoE and 1M token context...

0
How do we move from language models that only answer prompts to systems that can reason over million token contexts, understand real world signals,...

Uni-MoE-2.0-Omni: An Open Qwen2.5-7B Based Omnimodal MoE for Text, Image, Audio...

0
How do you build one open model that can reliably understand text, images, audio and video while still running efficiently? A team of researchers...

Cerebras Releases MiniMax-M2-REAP-162B-A10B: A Memory Efficient Version of MiniMax-M2 for Long...

0
Cerebras has released MiniMax-M2-REAP-162B-A10B, a compressed Sparse Mixture-of-Experts (SMoE) Causal Language Model derived from MiniMax-M2, using the new Router weighted Expert Activation Pruning (REAP)...

MBZUAI Researchers Introduce PAN: A General World Model For Interactable Long...

0
Most text to video models generate a single clip from a prompt and then stop. They do not keep an internal world state that...

NVIDIA AI Introduces TiDAR: A Hybrid Diffusion Autoregressive Architecture For High...

0
How far can we push large language model speed by reusing “free” GPU compute, without giving up autoregressive level output quality? NVIDIA researchers propose...

OpenAI Introduces GPT-5.1: Combining Adaptive Reasoning, Account Level Personalization, And Updated...

0
OpenAI has released GPT-5.1 as the next iteration in the GPT-5 family, with 2 core variants, GPT-5.1 Instant and GPT-5.1 Thinking. The update focuses...

Baidu Releases ERNIE-4.5-VL-28B-A3B-Thinking: An Open-Source and Compact Multimodal Reasoning Model Under...

0
How can we get large model level multimodal reasoning for documents, charts and videos while running only a 3B class model in production? Baidu...

Maya1: A New Open Source 3B Voice Model For Expressive Text...

0
Maya Research has released Maya1, a 3B parameter text to speech model that turns text plus a short description into controllable, expressive speech while...

Meta AI Releases Omnilingual ASR: A Suite of Open-Source Multilingual Speech...

0
How do you build a single speech recognition system that can understand 1,000's of languages including many that never had working ASR (automatic speech...

Gelato-30B-A3B: A State-of-the-Art Grounding Model for GUI Computer-Use Tasks, Surpassing Computer...

0
How do we teach AI agents to reliably find and click the exact on screen element we mean when we give them a simple...

Meet Kosmos: An AI Scientist that Automates Data-Driven Discovery

0
Kosmos, built by Edison Scientific, is an autonomous discovery system that runs long research campaigns on a single goal. Given a dataset and an...

StepFun AI Releases Step-Audio-EditX: A New Open-Source 3B LLM-Grade Audio Editing...

0
How can speech editing become as direct and controllable as simply rewriting a line of text? StepFun AI has open sourced Step-Audio-EditX, a 3B...

Moonshot AI Releases Kimi K2 Thinking: An Impressive Thinking Model that...

0
How do we design AI systems that can plan, reason, and act over long sequences of decisions without constant human guidance? Moonshot AI has...

CMU Researchers Introduce PPP and UserVille To Train Proactive And Personalized...

0
Most LLM agents are tuned to maximize task success. They resolve GitHub issues or answer deep research queries, but they do not reason carefully...

Google AI Introduces Consistency Training for Safer Language Models Under Sycophantic...

0
How can consistency training help language models resist sycophantic prompts and jailbreak style attacks while keeping their capabilities intact? Large language models often answer...

Comparing the Top 7 Large Language Models LLMs/Systems for Coding in...

0
Code-oriented large language models moved from autocomplete to software engineering systems. In 2025, leading models must fix real GitHub issues, refactor multi-repo backends, write...

Cache-to-Cache(C2C): Direct Semantic Communication Between Large Language Models via KV-Cache Fusion

0
Can large language models collaborate without sending a single token of text? a team of researchers from Tsinghua University, Infinigence AI, The Chinese University...

LongCat-Flash-Omni: A SOTA Open-Source Omni-Modal Model with 560B Parameters with 27B...

0
How do you design a single model that can listen, see, read and respond in real time across text, image, video and audio without...

Comparing the Top 6 OCR (Optical Character Recognition) Models/Systems in 2025

0
Optical character recognition has moved from plain text extraction to document intelligence. Modern systems must read scanned and digital PDFs in one pass, preserve...

Anthropic’s New Research Shows Claude can Detect Injected Concepts, but only...

0
How do you tell whether a model is actually noticing its own internal state instead of just repeating what training data said about thinking?...

Google AI Unveils Supervised Reinforcement Learning (SRL): A Step Wise Framework...

0
How can a small model learn to solve tasks it currently fails at, without rote imitation or relying on a correct rollout? A team...

Ant Group Releases Ling 2.0: A Reasoning-First MoE Language Model Series...

0
How do you build a language model that grows in capacity but keeps the computation for each token almost unchanged? The Inclusion AI team from...

IBM AI Team Releases Granite 4.0 Nano Series: Compact and Open-Source...

0
Small models are often blocked by poor instruction tuning, weak tool use formats, and missing governance. IBM AI team released Granite 4.0 Nano, a...

Microsoft Releases Agent Lightning: A New AI Framework that Enables Reinforcement...

0
How do you convert real agent traces into reinforcement learning RL transitions to improve policy LLMs without changing your existing agent stack? Microsoft AI...

Liquid AI Releases LFM2-ColBERT-350M: A New Small Model that brings Late...

0
Can a compact late interaction retriever index once and deliver accurate cross lingual search with fast inference? Liquid AI released LFM2-ColBERT-350M, a compact late...

Zhipu AI Releases ‘Glyph’: An AI Framework for Scaling the Context...

0
Can we render long texts as images and use a VLM to achieve 3–4× token compression, preserving accuracy while scaling a 128K context toward...

Meet ‘kvcached’: A Machine Learning Library to Enable Virtualized, Elastic KV...

0
Large language model serving often wastes GPU memory because engines pre-reserve large static KV cache regions per model, even when requests are bursty or...

5 Common LLM Parameters Explained with Examples

0
Large language models (LLMs) offer several parameters that let you fine-tune their behavior and control how they generate responses. If a model isn’t producing...

Liquid AI’s LFM2-VL-3B Brings a 3B Parameter Vision Language Model (VLM)...

0
Liquid AI released LFM2-VL-3B, a 3B parameter vision language model for image text to text tasks. It extends the LFM2-VL family beyond the 450M...

Anthrogen Introduces Odyssey: A 102B Parameter Protein Language Model that Replaces...

0
Anthrogen has introduced Odyssey, a family of protein language models for sequence and structure generation, protein editing, and conditional design. The production models range...

Google AI Introduces VISTA: A Test Time Self Improving Agent for...

0
TLDR: VISTA is a multi agent framework that improves text to video generation during inference, it plans structured prompts as scenes, runs a pairwise...

DeepSeek Just Released a 3B OCR Model: A 3B VLM Designed...

0
DeepSeek-AI released 3B DeepSeek-OCR, an end to end OCR and document parsing Vision-Language Model (VLM) system that compresses long text into a small set...

The Local AI Revolution: Expanding Generative AI with GPT-OSS-20B and the...

0
The landscape of AI is expanding. Today, many of the most powerful LLMs (large language models) reside primarily in the cloud, offering incredible capabilities...

Weak-for-Strong (W4S): A Novel Reinforcement Learning Algorithm that Trains a weak...

0
Researchers from Stanford, EPFL, and UNC introduce Weak-for-Strong Harnessing, W4S, a new Reinforcement Learning RL framework that trains a small meta-agent to design and...

Microsoft AI Proposes BitNet Distillation (BitDistill): A Lightweight Pipeline that Delivers...

0
Microsoft Research proposes BitNet Distillation, a pipeline that converts existing full precision LLMs into 1.58 bit BitNet students for specific tasks, while keeping accuracy...

Baidu’s PaddlePaddle Team Releases PaddleOCR-VL (0.9B): a NaViT-style + ERNIE-4.5-0.3B VLM...

0
How do you convert complex, multilingual documents—dense layouts, small scripts, formulas, charts, and handwriting—into faithful structured Markdown/JSON with state-of-the-art accuracy while keeping inference latency...

Google AI Releases C2S-Scale 27B Model that Translate Complex Single-Cell Gene...

0
A team of researchers from Google Research, Google DeepMind, and Yale released C2S-Scale 27B, a 27B parameter foundation model for single-cell analysis built on...
Anthropic Launches Claude Haiku 4.5: Small AI Model that Delivers Sonnet-4-Level Coding Performance at One-Third the Cost and more than Twice the Speed

Anthropic Launches Claude Haiku 4.5: Small AI Model that Delivers Sonnet-4-Level...

0
Anthropic released Claude Haiku 4.5, a latency-optimized “small” model that delivers similar levels of coding performance to Claude Sonnet 4 while running more than...

Meta AI’s ‘Early Experience’ Trains Language Agents without Rewards—and Outperforms Imitation...

0
How would your agent stack change if a policy could train purely from its own outcome-grounded rollouts—no rewards, no demos—yet beat imitation learning across...

Alibaba’s Qwen AI Releases Compact Dense Qwen3-VL 4B/8B (Instruct & Thinking)...

0
Do you actually need a giant VLM when dense Qwen3-VL 4B/8B (Instruct/Thinking) with FP8 runs in low VRAM yet retains 256K→1M context and the...

Andrej Karpathy Releases ‘nanochat’: A Minimal, End-to-End ChatGPT-Style Pipeline You Can...

0
Andrej Karpathy has open-sourced nanochat, a compact, dependency-light codebase that implements a full ChatGPT-style stack—from tokenizer training to web UI inference—aimed at reproducible, hackable...

NVIDIA Researchers Propose Reinforcement Learning Pretraining (RLP): Reinforcement as a Pretraining...

0
NVIDIA AI has introduced Reinforcement Learning Pretraining (RLP), a training objective that injects reinforcement learning into the pretraining stage rather than deferring it to...

SwiReasoning: Entropy-Driven Alternation of Latent and Explicit Chain-of-Thought for Reasoning LLMs

0
SwiReasoning is a decoding-time framework that lets a reasoning LLM decide when to think in latent space and when to write explicit chain-of-thought, using...

Meet OpenTSLM: A Family of Time-Series Language Models (TSLMs) Revolutionizing Medical...

0
A significant development is set to transform AI in healthcare. Researchers at Stanford University, in collaboration with ETH Zurich and tech leaders including Google...

Liquid AI Releases LFM2-8B-A1B: An On-Device Mixture-of-Experts with 8.3B Params and...

0
How much capability can a sparse 8.3B-parameter MoE with a ~1.5B active path deliver on your phone without blowing latency or memory? Liquid AI...

Microsoft Research Releases Skala: a Deep-Learning Exchange–Correlation Functional Targeting Hybrid-Level Accuracy...

0
TL;DR: Skala is a deep-learning exchange–correlation functional for Kohn–Sham Density Functional Theory (DFT) that targets hybrid-level accuracy at semi-local cost, reporting MAE ≈ 1.06...

Tiny Recursive Model (TRM): A Tiny 7M Model that Surpass DeepSeek-R1,...

0
Can an iterative draft–revise solver that repeatedly updates a latent scratchpad outperform far larger autoregressive LLMs on ARC-AGI? Samsung SAIT (Montreal) has released Tiny...

Meta AI Open-Sources OpenZL: A Format-Aware Compression Framework with a Universal...

0
How much compression ratio and throughput would you recover by training a format-aware graph compressor and shipping only a self-describing graph to a universal...

Salesforce AI Research Releases CoDA-1.7B: a Discrete-Diffusion Code Model with Bidirectional,...

0
Salesforce AI Research released CoDA-1.7B, a diffusion-based language model for code that generates by denoising whole sequences with bidirectional context, updating multiple tokens in...

This AI Paper Proposes a Novel Dual-Branch Encoder-Decoder Architecture for Unsupervised...

0
Can a speech enhancer trained only on real noisy recordings cleanly separate speech and noise—without ever seeing paired data? A team of researchers from...

A Coding Implementation to Build a Transformer-Based Regression Language Model to...

0
We will build a Regression Language Model (RLM), a model that predicts continuous numerical values directly from text sequences in this coding implementation. Instead...

Google Proposes TUMIX: Multi-Agent Test-Time Scaling With Tool-Use Mixture

0
What if, instead of re-sampling one agent, you could push Gemini-2.5 Pro to 34.1% on HLE by mixing 12–15 tool-using agents that share notes...

Can a Small Language Model Predict Kernel Latency, Memory, and Model...

0
Researchers from Cornell and Google introduce a unified Regression Language Model (RLM) that predicts numeric outcomes directly from code strings—covering GPU kernel latency, program...

Neuphonic Open-Sources NeuTTS Air: A 748M-Parameter On-Device Speech Language Model with...

0
Neuphonic has released NeuTTS Air, an open-source text-to-speech (TTS) speech language model designed to run locally in real time on CPUs. The Hugging Face...

IBM Released new Granite 4.0 Models with a Novel Hybrid Mamba-2/Transformer...

0
IBM just released Granite 4.0, an open-source LLM family that swaps monolithic Transformers for a hybrid Mamba-2/Transformer stack to cut serving memory while keeping...

ServiceNow AI Releases Apriel-1.5-15B-Thinker: An Open-Weights Multimodal Reasoning Model that Hits...

0
ServiceNow AI Research Lab has released Apriel-1.5-15B-Thinker, a 15-billion-parameter open-weights multimodal reasoning model trained with a data-centric mid-training recipe—continual pretraining followed by supervised fine-tuning—without...

Liquid AI Released LFM2-Audio-1.5B: An End-to-End Audio Foundation Model with Sub-100...

0
Liquid AI has released LFM2-Audio-1.5B, a compact audio–language foundation model that both understands and generates speech and text through a single end-to-end stack. It...

Zhipu AI Releases GLM-4.6: Achieving Enhancements in Real-World Coding, Long-Context Processing,...

0
Zhipu AI has released GLM-4.6, a major update to its GLM series focused on agentic workflows, long-context reasoning, and practical coding tasks. The model...

DeepSeek V3.2-Exp Cuts Long-Context Costs with DeepSeek Sparse Attention (DSA) While...

0
Table of contentsFP8 index → top-k selection → sparse core attentionLets Talk about it's efficiency and accuracySummaryFAQs DeepSeek released DeepSeek-V3.2-Exp, an “intermediate” update to V3.1...

Gemini Robotics 1.5: DeepMind’s ER↔VLA Stack Brings Agentic Robots to the...

0
Can a single AI stack plan like a researcher, reason over scenes, and transfer motions across different robots—without retraining from scratch? Google DeepMind’s Gemini...

Top 10 Local LLMs (2025): Context Windows, VRAM Targets, and Licenses...

0
Local LLMs matured fast in 2025: open-weight families like Llama 3.1 (128K context length (ctx)), Qwen3 (Apache-2.0, dense + MoE), Gemma 2 (9B/27B, 8K...

Meet Qwen3Guard: The Qwen3-based Multilingual Safety Guardrail Models Built for Global,...

0
Can safety keep up with real-time LLMs? Alibaba’s Qwen team thinks so, and it just shipped Qwen3Guard—a multilingual guardrail model family built to moderate...

Sakana AI Released ShinkaEvolve: An Open-Source Framework that Evolves Programs for...

0
Table of contentsWhat problem is it actually solving?Does the sample-efficiency claim hold beyond toy problems?How does the evolutionary loop look in practice?What are the...

Meta FAIR Released Code World Model (CWM): A 32-Billion-Parameter Open-Weights LLM,...

0
Meta FAIR released Code World Model (CWM), a 32-billion-parameter dense decoder-only LLM that injects world modeling into code generation by training on execution traces...

Alibaba’s Qwen3-Max: Production-Ready Thinking Mode, 1T+ Parameters, and Day-One Coding/Agentic Bench...

0
Alibaba has released Qwen3-Max, a trillion-parameter Mixture-of-Experts (MoE) model positioned as its most capable foundation model to date, with an immediate public on-ramp via...

Alibaba Qwen Team Just Released FP8 Builds of Qwen3-Next-80B-A3B (Instruct &...

0
Alibaba’s Qwen team has just released FP8-quantized checkpoints for its new Qwen3-Next-80B-A3B models in two post-training variants—Instruct and Thinking—aimed at high-throughput inference with ultra-long...

MIT Researchers Enhanced Artificial Intelligence (AI) 64x Better at Planning, Achieving...

0
Can a 8B-parameter language model produce provably valid multi-step plans instead of plausible guesses? MIT CSAIL researchers introduce PDDL-INSTRUCT, an instruction-tuning framework that couples...

LLM-as-a-Judge: Where Do Its Signals Break, When Do They Hold, and...

0
What exactly is being measured when a judge LLM assigns a 1–5 (or pairwise) score? Most “correctness/faithfulness/completeness” rubrics are project-specific. Without task-grounded definitions, a scalar...

xAI launches Grok-4-Fast: Unified Reasoning and Non-Reasoning Model with 2M-Token Context...

0
xAI introduced Grok-4-Fast, a cost-optimized successor to Grok-4 that merges “reasoning” and “non-reasoning” behaviors into a single set of weights controllable via system prompts....

Xiaomi Released MiMo-Audio, a 7B Speech Language Model Trained on 100M+...

0
Xiaomi’s MiMo team released MiMo-Audio, a 7-billion-parameter audio-language model that runs a single next-token objective over interleaved text and discretized speech, scaling pretraining beyond...

Qwen3-ASR-Toolkit: An Advanced Open Source Python Command-Line Toolkit for Using the Qwen-ASR...

0
Qwen has released Qwen3-ASR-Toolkit, an MIT-licensed Python CLI that programmatically bypasses the Qwen3-ASR-Flash API’s 3-minute/10 MB per-request limit by performing VAD-aware chunking, parallel API...

Building AI agents is 5% AI and 100% software engineering

0
Production-grade agents live or die on data plumbing, controls, and observability—not on model choice. The doc-to-chat pipeline below maps the concrete layers and why...

H Company Releases Holo1.5: An Open-Weight Computer-Use VLMs Focused on GUI...

0
H Company (A french AI startup) releases Holo1.5, a family of open foundation vision models purpose-built for computer-use (CU) agents that act on real...

Alibaba Releases Tongyi DeepResearch: A 30B-Parameter Open-Source Agentic LLM Optimized for...

0
Table of contentsWhat the benchmarks show ?Architecture and inference profileTraining pipeline: synthetic data + on-policy RLRole in document and web research workflowsKey features of...

IBM AI Releases Granite-Docling-258M: An Open-Source, Enterprise-Ready Document AI Model

0
IBM has released Granite-Docling-258M, an open-source (Apache-2.0) vision-language model designed specifically for end-to-end document conversion. The model targets layout-faithful extraction—tables, code, equations, lists, captions,...

Ai2 Researchers are Changing the Benchmarking Game by Introducing Fluid Benchmarking that...

0
A team of researchers from Allen Institute for Artificial Intelligence (Ai2), University of Washington and CMU introduce Fluid Benchmarking, an adaptive LLM evaluation method that...

Stanford Researchers Introduced MedAgentBench: A Real-World Benchmark for Healthcare AI Agents

0
A team of Stanford University researchers have released MedAgentBench, a new benchmark suite designed to evaluate large language model (LLM) agents in healthcare contexts....

Meta AI Released MobileLLM-R1: A Edge Reasoning Model with less than...

0
Table of contentsWhat architecture powers MobileLLM-R1?How efficient is the training?How does it perform against other open models?Where does MobileLLM-R1 fall short?How does MobileLLM-R1 compare...

UT Austin and ServiceNow Research Team Releases AU-Harness: An Open-Source Toolkit...

0
Voice AI is becoming one of the most important frontiers in multimodal AI. From intelligent assistants to interactive agents, the ability to understand and...

Google AI Releases VaultGemma: The Largest and Most Capable Open Model...

0
Google AI Research and DeepMind have released VaultGemma 1B, the largest open-weight large language model trained entirely with differential privacy (DP). This development is...

IBM AI Research Releases Two English Granite Embedding Models, Both Based...

0
IBM has quietly built a strong presence in the open-source AI ecosystem, and its latest release shows why it shouldn’t be overlooked. The company...

How to Build a Multilingual OCR AI Agent in Python with...

0
In this tutorial, we build an Advanced OCR AI Agent in Google Colab using EasyOCR, OpenCV, and Pillow, running fully offline with GPU acceleration....

BentoML Released llm-optimizer: An Open-Source AI Tool for Benchmarking and Optimizing...

0
BentoML has recently released llm-optimizer, an open-source framework designed to streamline the benchmarking and performance tuning of self-hosted large language models (LLMs). The tool...

Deepdub Introduces Lightning 2.5: A Real-Time AI Voice Model With 2.8x...

0
Deepdub, an Israeli Voice AI startup, has introduced Lightning 2.5, a real-time foundational voice model designed to power scalable, production-grade voice applications. The new...

TwinMind Introduces Ear-3 Model: A New Voice AI Model that Sets...

0
TwinMind, a California-based Voice AI startup, unveiled Ear-3 speech-recognition model, claiming state-of-the-art performance on several key metrics and expanded multilingual support. The release positions...

What are Optical Character Recognition (OCR) Models? Top Open-Source OCR Models

0
Optical Character Recognition (OCR) is the process of turning images that contain text—such as scanned pages, receipts, or photographs—into machine-readable text. What began as...

Meet mmBERT: An Encoder-only Language Model Pretrained on 3T Tokens of...

0
Table of contentsWhy was a new multilingual encoder needed?Understanding the architecture of mmBERTWhat training data and phases were used?What new training strategies were introduced?How...

Baidu Releases ERNIE-4.5-21B-A3B-Thinking: A Compact MoE Model for Deep Reasoning

0
Baidu AI Research team has just released ERNIE-4.5-21B-A3B-Thinking, a new reasoning-focused large language model designed around efficiency, long-context reasoning, and tool integration. Being part...

Building a Speech Enhancement and Automatic Speech Recognition (ASR) Pipeline in...

0
In this tutorial, we walk through an advanced yet practical workflow using SpeechBrain. We start by generating our own clean speech samples with gTTS,...

MBZUAI Researchers Release K2 Think: A 32B Open-Source System for Advanced...

0
A team of researchers from MBZUAI’s Institute of Foundation Models and G42 released K2 Think, is a 32B-parameter open reasoning system for advanced AI...

Alibaba Qwen Team Releases Qwen3-ASR: A New Speech Recognition Model Built...

0
Alibaba Cloud’s Qwen team unveiled Qwen3-ASR Flash, an all-in-one automatic speech recognition (ASR) model (available as API service) built upon the strong intelligence of...

Meta Superintelligence Labs Introduces REFRAG: Scaling RAG with 16× Longer Contexts...

0
Table of contentsWhy is long context such a bottleneck for LLMs?How does REFRAG compress and shorten context?How is acceleration achieved?How does REFRAG preserve accuracy?What...

Tilde AI Releases TildeOpen LLM: An Open-Source Large Language Model with...

0
Latvian language-tech firm Tilde has released TildeOpen LLM, an open-source foundational large language model (LLM) purpose-built for European languages, with a sharp focus on...

From Pretraining to Post-Training: Why Language Models Hallucinate and How Evaluation...

0
Large language models (LLMs) very often generate “hallucinations”—confident yet incorrect outputs that appear plausible. Despite improvements in training methods and architectures, hallucinations persist. A...

Alibaba AI Unveils Qwen3-Max Preview: A Trillion-Parameter Qwen Model with Super...

0
Alibaba’s Qwen Team unveiled Qwen3-Max-Preview (Instruct), a new flagship large language model with over one trillion parameters—their largest to date. It is accessible through...

Biomni-R0: New Agentic LLMs Trained End-to-End with Multi-Turn Reinforcement Learning for...

0
Table of contentsThe Growing Role of AI in Biomedical ResearchThe Core Challenge: Matching Expert-Level ReasoningWhy Traditional Approaches Fall ShortBiomni-R0: A New Paradigm Using Reinforcement...

Recent articles