mtmd: Add DeepSeekOCR Support #17400

sfallah · 2025-11-20T09:11:15Z

Feature Request: #16676

Make sure to read the contributing guidelines before submitting a PR

init commit

mtmd: fix vision model processing

…f/deepseek-ocr

testing Vision model loading

mtmd: DeepseekOCR Implement DeepSeek3B-MoE-A570M (LM component)

…ut in deepseek2 model

…f/deepseek-ocr

…e image decoding fails

sfallah · 2025-11-21T20:20:33Z

@Acly
I thought it might interest you!
I have implement add_rel_pos_inplace and get_rel_pos (for DeepseekOCR SAM-Vit) without using the cpu only ggml ops (ggml_get_rel_pos etc.) and it works on Metal.
I will also test on CUDA tomorrow, and hopefully the CI checks will later show if the implementation also works for other devices.

You will find the code here:
https://github.com/sfallah/llama.cpp/blob/sf/deepseek-ocr/tools/mtmd/clip.cpp#L2473
https://github.com/sfallah/llama.cpp/blob/sf/deepseek-ocr/tools/mtmd/clip.cpp#L2502

Disclaimer:
My implementation is very rudimentary but @bluebread reviewed and already improved it a bit in separate branch.
I have been using ChatGPT to translate from original pytorch code but have debugged the code and got it to run.
The functions have also been tested with and compared with ggml/examples/sam (sam example from ggml repo), the results align well.

…f/deepseek-ocr

Fixed get_rel_pos & add_rel_pos_inplace operator

- issues fixed mainly related wrong config like n_patches etc. - configs need to be corrected in the converter

ngxson · 2025-11-22T16:41:16Z

I have implement add_rel_pos_inplace and get_rel_pos (for DeepseekOCR SAM-Vit) without using the cpu only ggml ops (ggml_get_rel_pos etc.) and it works on Metal.

Just wondering, which ops are currently CPU-only (apart from ggml_arange?)

Beside, it's better introduce new op in a follow-up PR, to avoid adding too much complexity to the review process.

debug: correct token order

Add native resolution support

- changes are concerning PR #4

sfallah and others added 22 commits November 14, 2025 12:40

mtmd: llama.cpp DeepSeekOCR support

43a130b

init commit

loading sam tensors

b6b9f02

mtmd: fix vision model processing

85c7cda

Merge pull request #1 from bluebread/sf/deepseek-ocr

578c8d7

mtmd: fix vision model processing

deepseek-ocr clip-vit model impl

2aab52e

mtmd: add DeepSeek-OCR LM support with standard attention

eab28ed

mtmd: successfully runs DeepSeek-OCR LM in llama-cli

7630587

mtmd: Fix RoPE type for DeepSeek-OCR LM.

2de3436

Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into s…

e8b2610

…f/deepseek-ocr

loading LM

97e0907

testing Vision model loading

Merge branch 'sf/deepseek-ocr' into sf/deepseek-ocr

13dc6fb

Merge pull request #2 from bluebread/sf/deepseek-ocr

b32bb5e

mtmd: DeepseekOCR Implement DeepSeek3B-MoE-A570M (LM component)

sam warmup working

790bbb9

sam erroneous return corrected

cec9a5c

clip-vit: corrected cls_embd concat

8b3d319

clip-vit: model convert qkv_proj split

1e08157

corrected combining of image encoders' results

331cea8

fix: update callback for ffn_moe_weighted and add callback for attn_o…

6c0715b

…ut in deepseek2 model

Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into s…

a65ddf5

…f/deepseek-ocr

concat image_newline and image_seperator tokens

63a042f

visual_model warmup (technically) works

89afda8

window partitioning using standard ggml ops

88032f4

sfallah requested review from CISC, ggerganov and ngxson as code owners November 20, 2025 09:11

github-actions bot added model Model specific examples python python script changes labels Nov 20, 2025

sfallah marked this pull request as draft November 20, 2025 09:12

sfallah mentioned this pull request Nov 20, 2025

ggml : enhance rel-pos and window ops with CUDA support #17383

Open

bluebread and others added 8 commits November 20, 2025 13:36

Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into s…

1268dc3

…f/deepseek-ocr

sam implementation without using CPU only ops

68b206b

clip: fixed warnings

8bce66d

Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into s…

5e6cf3c

…f/deepseek-ocr

mtmd: fix get_rel_pos

7e9fbec

Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into s…

0f5587d

…f/deepseek-ocr

mtmd: fixed the wrong scaler for get_rel_pos

7b8d735

image encoding technically works but the output can't be checked sing…

86f111f

…e image decoding fails

sfallah mentioned this pull request Nov 21, 2025

Feature Request: Can llama.cpp add support for DeepSeek OCR? #16676

Open

4 tasks

bluebread and others added 5 commits November 22, 2025 02:09

mtmd: minor changed

effe669

Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into s…

f8f66a1

…f/deepseek-ocr

Merge pull request #3 from bluebread/sf/deepseek-ocr

3fcfc3a

Fixed get_rel_pos & add_rel_pos_inplace operator

mtmd: add native resolution support

ee8a148

- image encoding debugged

4cfa15f

- issues fixed mainly related wrong config like n_patches etc. - configs need to be corrected in the converter

bluebread and others added 5 commits November 23, 2025 09:22

mtmd: correct token order

3f71188

Merge pull request #5 from bluebread/dsocr-debug

a594990

debug: correct token order

Merge branch 'sf/deepseek-ocr' into sf/deepseek-ocr

6dfda99

Merge pull request #4 from bluebread/sf/deepseek-ocr

7941f5d

Add native resolution support

- dynamic resizing

206f8ab

- changes are concerning PR #4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

mtmd: Add DeepSeekOCR Support #17400

mtmd: Add DeepSeekOCR Support #17400

sfallah commented Nov 20, 2025 •

edited

Loading

Uh oh!

sfallah commented Nov 21, 2025

Uh oh!

ngxson commented Nov 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mtmd: Add DeepSeekOCR Support #17400

Are you sure you want to change the base?

mtmd: Add DeepSeekOCR Support #17400

Conversation

sfallah commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sfallah commented Nov 21, 2025

Uh oh!

ngxson commented Nov 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sfallah commented Nov 20, 2025 •

edited

Loading