DeepSeek-OCR.Q8_0.gguf VLM Loading Fails with RuntimeError by code python
I am encountering a recurring RuntimeError: Failed to create VLM, error message string: Model loading failed when attempting to load the DeepSeek-OCR.Q8_0.gguf model using the nexaai.vlm.VLM class.
Crucially, other models—both standard LLMs (e.g., Qwen3-1.7B-Q8_0.gguf) and other VLMs (e.g., using a different mmproj-F16.gguf combination)—load and run perfectly fine in the same environment. This strongly suggests the issue is uniquely specific to the DeepSeek-OCR.Q8_0.gguf model's architecture or its associated vision tower (mmproj.F16.nexa) within the current library/runtime.
Environment and Setup Details
Model Name: DeepSeek-OCR.Q8_0.gguf
Vision Tower Path: mmproj.F16.nexa
Library: nexaai (version not specified, but latest available used for the attached code structure)
Plugin ID: cpu_gpu
The failure occurs specifically at the VLM.from_ call, which is where the main model file (.gguf) and the vision projection file (.nexa) are loaded.
Full Code
import os
import io
from nexaai.vlm import VLM
from nexaai.common import GenerationConfig, ModelConfig, MultiModalMessage, MultiModalMessageContent
def vlm_example():
"""VLM Inference example"""
print("=== VLM Inference Example ===")
# Model configuration
# Note: Paths are absolute for clarity
model_name = "D:/29.AURAV_DEV_GITHUB/deepseek-ocr-gguf/DeepSeek-OCR.Q8_0.gguf"
mmproj_path = "D:/29.AURAV_DEV_GITHUB/deepseek-ocr-gguf/mmproj.F16.nexa"
plugin_id = "cpu_gpu"
max_tokens = 100
system_message = "ocr the text from the image."
image_path = 'D:/29.AURAV_DEV_GITHUB/deepseek-ocr-gguf/Screenshot_1.png' # Assuming this path is correct
print(f"Loading model: {model_name}")
print(f"Using plugin: {plugin_id}")
# Check for image existence (omitted check for brevity, as the failure happens before this)
if not (image_path and os.path.exists(image_path)):
print(f"\033[93mWARNING: The specified image_path ('{image_path}') does not exist or was not provided. Multimodal prompts will not include image input.\033[0m")
# The failure occurs here
m_cfg = ModelConfig()
vlm = VLM.from_(name_or_path=model_name, mmproj_path=mmproj_path, m_cfg=m_cfg, plugin_id=plugin_id)
# Rest of the inference code follows (omitted for troubleshooting scope)
# ...
# ...
vlm_example()
Full Error Log
(venv) PS D:\29.AURAV_DEV_GITHUB\deepseek-ocr-gguf> python run_deepseek_ocr.py
=== VLM Inference Example ===
Loading model: D:/29.AURAV_DEV_GITHUB/deepseek-ocr-gguf/DeepSeek-OCR.Q8_0.gguf
Using plugin: cpu_gpu
Traceback (most recent call last):
File "D:\29.AURAV_DEV_GITHUB\deepseek-ocr-gguf\run_deepseek_ocr.py", line 90, in
vlm_example()
File "D:\29.AURAV_DEV_GITHUB\deepseek-ocr-gguf\run_deepseek_ocr.py", line 32, in vlm_example
vlm = VLM.from_(name_or_path=model_name, mmproj_path=mmproj_path, m_cfg=m_cfg, plugin_id=plugin_id)
File "D:\29.AURAV_DEV_GITHUB\deepseek-ocr-gguf\venv\lib\site-packages\nexaai\utils\model_manager.py", line 1743, in wrapper
return func(*args, **kwargs)
File "D:\29.AURAV_DEV_GITHUB\deepseek-ocr-gguf\venv\lib\site-packages\nexaai\base.py", line 23, in from_
return cls._load_from(name_or_path, **kwargs)
File "D:\29.AURAV_DEV_GITHUB\deepseek-ocr-gguf\venv\lib\site-packages\nexaai\vlm.py", line 48, in _load_from
return PyBindVLMImpl._load_from(local_path, mmproj_path, model_name, m_cfg, plugin_id, device_id)
File "D:\29.AURAV_DEV_GITHUB\deepseek-ocr-gguf\venv\lib\site-packages\nexaai\vlm_impl\pybind_vlm_impl.py", line 80, in _load_from
handle = vlm_bind.create_vlm(
RuntimeError: Failed to create VLM, error message string: Model loading failed
Questions for the Community
Is the DeepSeek-OCR GGUF format (specifically for the multimodal elements) fully supported by the current VLM loader in the nexaai library?
Are there specific formatting or naming requirements for the mmproj.F16.nexa file that I might be missing, or is this error a generic failure during the vision tower's initialization?
Given that other LLMs and VLMs load successfully, what are the most common causes of Model loading failed when dealing with a specific VLM/MM-proj combination like DeepSeek-OCR?
Any guidance on troubleshooting the VLM loading stage would be greatly appreciated!