"In production, DeepSeek-OCR can generate training data for LLMs/VLMs at a scale of 200k+ pages per day (a single A100-40G)." DeepSeek lowkey release, they call it just another OCR this week but if you dive deeper, they introduce a new way of compress the image token 10x or 20x. You can store 10k words in 1.5k compressive visual tokens. It's a breakthrough.
"DeepSeek-OCR generates 200k+ pages daily for LLMs/VLMs"
More Relevant Posts
-
Say hello to a new OCR model - from Ai2, the champions of actual true open source: - OlmOCR 2: a major update to the open OCR model for complex documents - now better at handling tables, equations, handwriting, and degraded scans. - Achieves 82.4pct on olmOCR-Benc thanks to a richer training mix, including 20k historical documents pages - FP8 quantized model processes 3.4k tokens/sec on a single H100 → around USD 180 to parse one million pages. - Apache 2.0 license, with full support for domain fine-tuning and deployment.
To view or add a comment, sign in
-
-
Bring AI into your debugging workflow. The Inspector MCP Server lets your coding agents access real production errors, analyze them, and suggest fixes, all from your IDE. https://lnkd.in/dss44_nW
To view or add a comment, sign in
-
-
Memvid: your whole database in a single video You probably heard of the reslease of DeepSeek OCR in the past days, and people praised it for the ability of compress a lot of info into visual format (images) with much fewer tokens than text. But that is not the first time people attempted this idea. Below is Memvid, with even weirder idea. Memvid compress your whole large text database into a single MP4 file, while you can still search within millisecond, and no loss of accuracy. This effectively turning videos into a portable data store. The main idea is: it converts some chunks of text into a QR code. Each QR code is a frame of the video. They pack thousands frames like that into a single video. Together with a smart way of indexing, you can pull out the right QR code within millisecond and convert back to text when you need to retrieve data. That is a very unintuitive idea, I can't wrap my head around it. And weirdly it works.
To view or add a comment, sign in
-
-
New in Crashlytics: Debugging and fixing crashes just got faster 🔨⚡ Use the new MCP tools and the /crashlytics:connect command in Gemini CLI (or your AI tool of choice) to help prioritize, investigate, and fix crashes right in your codebase. Stop context-switching, start fixing → https://goo.gle/4oyFK7B
To view or add a comment, sign in
-
-
🚀 Exploring TOON: The New Standard for Fast, Efficient and Reliable LLM Workflows TOON (Tree-Oriented Object Notation) — a new, ultra-clean way to represent structured data. It focuses on human readability, removing the noise of quotes, braces, and commas while keeping the structure intact. 🌟 Why TOON Matters Makes configs & documentation easier to read Reduces visual clutter → faster understanding Allows comments & trailing commas Great for teams working on complex data models #TOON #LLM #AIagents #AIEngineering #PromptEngineering #TokenOptimization #ArtificialIntelligence
To view or add a comment, sign in
-
-
Prompt engineering is giving way to prompt compilation. 🚀 With DSPy, you “write the program,” and optimizers learn the prompts—turning brittle hacks into reproducible pipelines. DSPy compiles declarative LLM calls and, in minutes, can beat few-shot prompting—even with smaller open models. 🧠 Why now? 2025 papers show rising gains from automated prompt optimization and compression, cutting cost without hurting quality. ⚙️ If you build RAG/agents, start treating prompts like code: metrics, training data, and compile loops—then ship. 🤔 Will “compiled prompts” become the fourth compute axis alongside pretraining, posttraining, and inference? Sources [1] https://lnkd.in/d2qukbam [2] arxiv.org/abs/2310.03714 [3] dspy.ai/roadmap [4] arxiv.org/abs/2505.00019 #LLM #DSPy #MLOps This post was generated by my custom-built personal agent, powered by LLMs and designed to operate my computer. If you're curious about how it works, feel free to ask!
To view or add a comment, sign in
-
Wondering if it's just me... I'm observed a shift in the release patterns of open LLMs in the last few months. With the notable exception of Apertus, the general-purpose open models have dried up, and in their place a flurry of smaller specialists started getting released. I'm certainly not going to complain about smaller, faster models, especially ones that can deliver on being just as good at their tasks at a fraction of the inference cost... but my observations when comparing them are all over the map. The syntaxes for declaring and invoking tools vary wildly from one model to another, which makes integrating and objectively evaluating them especially challenging. I'm also wondering if we're really seeing a slowdown in the development of general-domain models, or just the calm before the next storm. Photo by Johannes Plenio via Pexels
To view or add a comment, sign in
-
-
Scans that think for themselves. 🧠 Yes, you read that right — the Scandit SDK 8.0 understands what you want to capture and why, adapting to context, automating the boring bits, and accelerating everything else. Some really cool highlights of what users can now do: ▶️ Read unreadable barcodes with OCR ▶️ Capture expiry dates & VINs automatically ▶️ Group barcodes by item in one scan It’s not just scanning. It’s understanding and acting. Dive in: https://okt.to/MR7l32
To view or add a comment, sign in
-
-
We've made a blog post about a new major version of the #rstats tune package! Two main changes: parallel processing frameworks and the ability to tune postprocessors. https://lnkd.in/ei5zMSSf
To view or add a comment, sign in
-
Why is everyone moving into building OCR models? Why is it important and how can you supercharge your OCR pipelines with Open Models? In this new blog post we take a walk around all the above questions and try answering them.
To view or add a comment, sign in
-