Incompatibility with transformers >= 4.41: DynamicCache.seen_tokens deprecated

#1
by rafaeltuelho - opened

Issue Description

The model's custom code (modeling_deepseek.py) uses the deprecated DynamicCache.seen_tokens attribute, which causes errors with newer versions of the transformers library (4.41+).

Error Message

AttributeError: 'DynamicCache' object has no attribute 'seen_tokens'

Environment

  • transformers version: 4.57.1 (also affects 4.49.0+)
  • PyTorch version: 2.7.0
  • Platform: macOS (Apple Silicon M4)
  • Python: 3.9

Root Cause

The seen_tokens attribute on DynamicCache was deprecated in transformers v4.41 and has been removed in later versions. The model's modeling_deepseek.py file references this attribute in the prepare_inputs_for_generation method.

Reference: https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite-Chat/discussions/9

Suggested Fix

Update the model code to use cache_position instead of seen_tokens, as recommended by the transformers deprecation warning:

The seen_tokens attribute is deprecated and will be removed in v4.41. Use the cache_position model input instead.

Workaround

Currently, users must downgrade to transformers==4.43.4 to use this model, which may conflict with other dependencies.

Context

We're integrating this MPS-compatible model into Docling to provide DeepSeek-OCR support on Apple Silicon. The transformers version constraint (>=4.46.0) in Docling's dependencies makes this incompatibility a blocker for MPS users.

Thank you for creating this MPS-compatible fork! It would be great to have it updated for newer transformers versions.

Issue Description

The model's custom code (modeling_deepseek.py) uses the deprecated DynamicCache.seen_tokens attribute, which causes errors with newer versions of the transformers library (4.41+).

Error Message

AttributeError: 'DynamicCache' object has no attribute 'seen_tokens'

Environment

  • transformers version: 4.57.1 (also affects 4.49.0+)
  • PyTorch version: 2.7.0
  • Platform: macOS (Apple Silicon M4)
  • Python: 3.9

Root Cause

The seen_tokens attribute on DynamicCache was deprecated in transformers v4.41 and has been removed in later versions. The model's modeling_deepseek.py file references this attribute in the prepare_inputs_for_generation method.

Reference: https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite-Chat/discussions/9

Suggested Fix

Update the model code to use cache_position instead of seen_tokens, as recommended by the transformers deprecation warning:

The seen_tokens attribute is deprecated and will be removed in v4.41. Use the cache_position model input instead.

Workaround

Currently, users must downgrade to transformers==4.43.4 to use this model, which may conflict with other dependencies.

Context

We're integrating this MPS-compatible model into Docling to provide DeepSeek-OCR support on Apple Silicon. The transformers version constraint (>=4.46.0) in Docling's dependencies makes this incompatibility a blocker for MPS users.

Thank you for creating this MPS-compatible fork! It would be great to have it updated for newer transformers versions.

Thank you for the suggestion. I've went ahead and found the PR that introduced that change and applied the patch.

https://github.com/huggingface/transformers/pull/29467/files

Also tested with both old transformers and new transformers version. Let me know if there are any issues. I am happy to hear you are adding this model to docling.

Awesome! Thank you for your prompt response.
I will test it here.

Sign up or log in to comment