"DeepSeek-OCR generates 200k+ pages daily for LLMs/VLMs"

This title was summarized by AI from the post below.

CEO @ AgenticFlow - The OS for Your AI Workforce

1mo

"In production, DeepSeek-OCR can generate training data for LLMs/VLMs at a scale of 200k+ pages per day (a single A100-40G)." DeepSeek lowkey release, they call it just another OCR this week but if you dive deeper, they introduce a new way of compress the image token 10x or 20x. You can store 10k words in 1.5k compressive visual tokens. It's a breakthrough.

To view or add a comment, sign in

More Relevant Posts

Konrad Banachewicz
3w
Report this post
Say hello to a new OCR model - from Ai2, the champions of actual true open source: - OlmOCR 2: a major update to the open OCR model for complex documents - now better at handling tables, equations, handwriting, and degraded scans. - Achieves 82.4pct on olmOCR-Benc thanks to a richer training mix, including 20k historical documents pages - FP8 quantized model processes 3.4k tokens/sec on a single H100 → around USD 180 to parse one million pages. - Apache 2.0 license, with full support for domain fine-tuning and deployment.
1 Comment
Like Comment
To view or add a comment, sign in
Laravel Hub

618 followers
3w
Report this post
Bring AI into your debugging workflow. The Inspector MCP Server lets your coding agents access real production errors, analyze them, and suggest fixes, all from your IDE. https://lnkd.in/dss44_nW
Like Comment
To view or add a comment, sign in
Oscar Le

Cofounder & CEO of SilverAI - SnapEdit, Fitroom | Ph.D. in Computer Science | Expert in AI, Emerging Tech, Startups
1mo
Report this post
Memvid: your whole database in a single video You probably heard of the reslease of DeepSeek OCR in the past days, and people praised it for the ability of compress a lot of info into visual format (images) with much fewer tokens than text. But that is not the first time people attempted this idea. Below is Memvid, with even weirder idea. Memvid compress your whole large text database into a single MP4 file, while you can still search within millisecond, and no loss of accuracy. This effectively turning videos into a portable data store. The main idea is: it converts some chunks of text into a QR code. Each QR code is a frame of the video. They pack thousands frames like that into a single video. Together with a smart way of indexing, you can pull out the right QR code within millisecond and convert back to text when you need to retrieve data. That is a very unintuitive idea, I can't wrap my head around it. And weirdly it works.
3 Comments
Like Comment
To view or add a comment, sign in
Firebase

57,118 followers
1w
Report this post
New in Crashlytics: Debugging and fixing crashes just got faster 🔨⚡ Use the new MCP tools and the /crashlytics:connect command in Gemini CLI (or your AI tool of choice) to help prioritize, investigate, and fix crashes right in your codebase. Stop context-switching, start fixing → https://goo.gle/4oyFK7B
3 Comments
Like Comment
To view or add a comment, sign in
Maaz Ahmed

AI-Driven Lead Software Test Automation Engineer - Aglie - SFPC
1w
Report this post
🚀 Exploring TOON: The New Standard for Fast, Efficient and Reliable LLM Workflows TOON (Tree-Oriented Object Notation) — a new, ultra-clean way to represent structured data. It focuses on human readability, removing the noise of quotes, braces, and commas while keeping the structure intact. 🌟 Why TOON Matters Makes configs & documentation easier to read Reduces visual clutter → faster understanding Allows comments & trailing commas Great for teams working on complex data models #TOON #LLM #AIagents #AIEngineering #PromptEngineering #TokenOptimization #ArtificialIntelligence
22 Comments
Like Comment
To view or add a comment, sign in
ByronInsight AG

854 followers
2w
Report this post
Prompt engineering is giving way to prompt compilation. 🚀 With DSPy, you “write the program,” and optimizers learn the prompts—turning brittle hacks into reproducible pipelines. DSPy compiles declarative LLM calls and, in minutes, can beat few-shot prompting—even with smaller open models. 🧠 Why now? 2025 papers show rising gains from automated prompt optimization and compression, cutting cost without hurting quality. ⚙️ If you build RAG/agents, start treating prompts like code: metrics, training data, and compile loops—then ship. 🤔 Will “compiled prompts” become the fourth compute axis alongside pretraining, posttraining, and inference? Sources [1] https://lnkd.in/d2qukbam [2] arxiv.org/abs/2310.03714 [3] dspy.ai/roadmap [4] arxiv.org/abs/2505.00019 #LLM #DSPy #MLOps This post was generated by my custom-built personal agent, powered by LLMs and designed to operate my computer. If you're curious about how it works, feel free to ask!
Like Comment
To view or add a comment, sign in
Joshua Turner

Technology Evangelist
3w
Report this post
Wondering if it's just me... I'm observed a shift in the release patterns of open LLMs in the last few months. With the notable exception of Apertus, the general-purpose open models have dried up, and in their place a flurry of smaller specialists started getting released. I'm certainly not going to complain about smaller, faster models, especially ones that can deliver on being just as good at their tasks at a fraction of the inference cost... but my observations when comparing them are all over the map. The syntaxes for declaring and invoking tools vary wildly from one model to another, which makes integrating and objectively evaluating them especially challenging. I'm also wondering if we're really seeing a slowdown in the development of general-domain models, or just the calm before the next storm. Photo by Johannes Plenio via Pexels
Like Comment
To view or add a comment, sign in
Esteban Tognini

LATAM SaaS Sales Executive | Regional Director / VP | GTM Strategy | Revenue Growth | Enterprise & Channel Leadership | Bilingual | Based in USA
6d
Report this post
Scans that think for themselves. 🧠 Yes, you read that right — the Scandit SDK 8.0 understands what you want to capture and why, adapting to context, automating the boring bits, and accelerating everything else. Some really cool highlights of what users can now do: ▶️ Read unreadable barcodes with OCR ▶️ Capture expiry dates & VINs automatically ▶️ Group barcodes by item in one scan It’s not just scanning. It’s understanding and acting. Dive in: https://okt.to/MR7l32
Like Comment
To view or add a comment, sign in
Max Kuhn

RStudio
2w
Report this post
We've made a blog post about a new major version of the #rstats tune package! Two main changes: parallel processing frameworks and the ability to tune postprocessors. https://lnkd.in/ei5zMSSf
Like Comment
To view or add a comment, sign in
Aritra Roy Gosthipaty
1mo
Report this post
Why is everyone moving into building OCR models? Why is it important and how can you supercharge your OCR pipelines with Open Models? In this new blog post we take a walk around all the above questions and try answering them.
1 Comment
Like Comment
To view or add a comment, sign in

1,706 followers

212 Posts

View Profile Connect

"DeepSeek-OCR generates 200k+ pages daily for LLMs/VLMs"

More Relevant Posts

Explore content categories