DeepSeek OCR 2: Revolutionizing Document Intelligence

DeepSeek has officially launched DeepSeek-OCR 2, a groundbreaking optical character recognition system that fundamentally shifts from traditional linear scanning to a model that interprets images with "human visual logic".
DeepEncoder V2: Visual Causal Flow
At the core of this release is the DeepEncoder V2 architecture. Unlike standard OCR tools that process documents strictly line-by-line, DeepSeek-OCR 2 employs a "Visual Causal Flow" mechanism. It dynamically rearranges image components based on semantic meaning, mimicking how a human reads a complex page—first understanding the global layout, columns, and relationships before diving into specific details.
This approach significantly improves performance on complex layouts, such as mixed text/structure documents and tables, by enabling the AI to "see" the global context first.
Unmatched Efficiency & Technical Specs
DeepSeek-OCR 2 introduces "Contexts Optical Compression," capable of representing content with up to 20 times fewer tokens compared to traditional models. This massive efficiency gain allows for:
- Dynamic Resolution: The model utilizes a flexible resolution strategy (defaulting to composite tokens like
(0-6)×768×768 + 1×1024×1024) to balance detail and speed. - Drastic Reduction in computation time and memory usage.
- Scalability for large-scale document ingestion tasks that would overwhelm other models.
Performance & Benchmarks
Evaluations on benchmarks like OmniDocBench v1.5 demonstrate a 3.73% improvement over previous baselines. More importantly, it matches or exceeds the capabilities of major cloud providers (Google Cloud Vision, AWS Textract) while running significantly more efficiently on local hardware.
It excels particularly in:
- Preserving complex document structures.
- Handling over 100 languages.
- Processing specialized content like chemical formulas (SMILES).
Open Source & Availability
DeepSeek-OCR 2 is fully open-source under the MIT license, reinforcing DeepSeek's commitment to accessible AI.
Install via Git
git clone https://github.com/deepseek-ai/DeepSeek-OCR-2.gitCommunity Discussion
wtf are math geeks in China doing with LLM optimization? This is like LLM Ozempic 10x
Impressive advancements! The focus on human-like logical order in image scanning could revolutionize document processing. Excited to see the impact on OCR accuracy. Great job, DeepSeek team!
Oh great, now we have AI that does what humans do, skips the title and 1st paragraph and gets annoyed the 2nd paragraph isn't setting the scene. 😂 Brilliant.
finally, OCR that understands layout context instead of just grid scanning
Any idea how it compares with Florence?
@grok can I run this with MBP m3? Or on a vps and what speed can I expect



