DeepSeek-OCR.Q8_0.gguf VLM Loading Fails with RuntimeError by code python

by TINNT99 - opened 25 days ago

25 days ago

I am encountering a recurring RuntimeError: Failed to create VLM, error message string: Model loading failed when attempting to load the DeepSeek-OCR.Q8_0.gguf model using the nexaai.vlm.VLM class.

Crucially, other models—both standard LLMs (e.g., Qwen3-1.7B-Q8_0.gguf) and other VLMs (e.g., using a different mmproj-F16.gguf combination)—load and run perfectly fine in the same environment. This strongly suggests the issue is uniquely specific to the DeepSeek-OCR.Q8_0.gguf model's architecture or its associated vision tower (mmproj.F16.nexa) within the current library/runtime.

Environment and Setup Details

Model Name: DeepSeek-OCR.Q8_0.gguf

Vision Tower Path: mmproj.F16.nexa

Library: nexaai (version not specified, but latest available used for the attached code structure)

Plugin ID: cpu_gpu

The failure occurs specifically at the VLM.from_ call, which is where the main model file (.gguf) and the vision projection file (.nexa) are loaded.

Full Code

import os
import io

from nexaai.vlm import VLM
from nexaai.common import GenerationConfig, ModelConfig, MultiModalMessage, MultiModalMessageContent

def vlm_example():
"""VLM Inference example"""
print("=== VLM Inference Example ===")

# Model configuration
# Note: Paths are absolute for clarity
model_name = "D:/29.AURAV_DEV_GITHUB/deepseek-ocr-gguf/DeepSeek-OCR.Q8_0.gguf"
mmproj_path = "D:/29.AURAV_DEV_GITHUB/deepseek-ocr-gguf/mmproj.F16.nexa"

plugin_id = "cpu_gpu"
max_tokens = 100
system_message = "ocr the text from the image."
image_path = 'D:/29.AURAV_DEV_GITHUB/deepseek-ocr-gguf/Screenshot_1.png'  # Assuming this path is correct

print(f"Loading model: {model_name}")
print(f"Using plugin: {plugin_id}")

# Check for image existence (omitted check for brevity, as the failure happens before this)
if not (image_path and os.path.exists(image_path)):
    print(f"\033[93mWARNING: The specified image_path ('{image_path}') does not exist or was not provided. Multimodal prompts will not include image input.\033[0m")

# The failure occurs here
m_cfg = ModelConfig()
vlm = VLM.from_(name_or_path=model_name, mmproj_path=mmproj_path, m_cfg=m_cfg, plugin_id=plugin_id)

# Rest of the inference code follows (omitted for troubleshooting scope)
# ...
# ...

vlm_example()

Full Error Log

(venv) PS D:\29.AURAV_DEV_GITHUB\deepseek-ocr-gguf> python run_deepseek_ocr.py
=== VLM Inference Example ===
Loading model: D:/29.AURAV_DEV_GITHUB/deepseek-ocr-gguf/DeepSeek-OCR.Q8_0.gguf
Using plugin: cpu_gpu
Traceback (most recent call last):
File "D:\29.AURAV_DEV_GITHUB\deepseek-ocr-gguf\run_deepseek_ocr.py", line 90, in
vlm_example()
File "D:\29.AURAV_DEV_GITHUB\deepseek-ocr-gguf\run_deepseek_ocr.py", line 32, in vlm_example
vlm = VLM.from_(name_or_path=model_name, mmproj_path=mmproj_path, m_cfg=m_cfg, plugin_id=plugin_id)
File "D:\29.AURAV_DEV_GITHUB\deepseek-ocr-gguf\venv\lib\site-packages\nexaai\utils\model_manager.py", line 1743, in wrapper
return func(*args, **kwargs)
File "D:\29.AURAV_DEV_GITHUB\deepseek-ocr-gguf\venv\lib\site-packages\nexaai\base.py", line 23, in from_
return cls._load_from(name_or_path, **kwargs)
File "D:\29.AURAV_DEV_GITHUB\deepseek-ocr-gguf\venv\lib\site-packages\nexaai\vlm.py", line 48, in _load_from
return PyBindVLMImpl._load_from(local_path, mmproj_path, model_name, m_cfg, plugin_id, device_id)
File "D:\29.AURAV_DEV_GITHUB\deepseek-ocr-gguf\venv\lib\site-packages\nexaai\vlm_impl\pybind_vlm_impl.py", line 80, in _load_from
handle = vlm_bind.create_vlm(
RuntimeError: Failed to create VLM, error message string: Model loading failed

Questions for the Community

Is the DeepSeek-OCR GGUF format (specifically for the multimodal elements) fully supported by the current VLM loader in the nexaai library?

Are there specific formatting or naming requirements for the mmproj.F16.nexa file that I might be missing, or is this error a generic failure during the vision tower's initialization?

Given that other LLMs and VLMs load successfully, what are the most common causes of Model loading failed when dealing with a specific VLM/MM-proj combination like DeepSeek-OCR?

Any guidance on troubleshooting the VLM loading stage would be greatly appreciated!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment