Patch loading SparseEncoder from Hub

by tomaarsen HF Staff - opened 6 days ago

base: refs/heads/main

←

from: refs/pr/2

Discussion Files changed

+30

-18

tomaarsen

6 days ago

•

edited 6 days ago

Hello!

Pull Request overview

Fix SparseEncoder("naver/splade-code-8B", ...) failing on hub loads with Unrecognized configuration class ... for this kind of AutoModel: AutoModelForMaskedLM
Move the LoRA adapter back into a lora/ subfolder so transformers' auto-PEFT path doesn't fire on this repo
Pre-download the lora/ subfolder in Qwen3ForCausalLM.from_pretrained to dodge a Windows path-join bug in PEFT

Details

This bug ended up taking a moment to chase down. The integration that landed previously works for local paths but breaks on hub loads, which is rather frustrating. With adapter_config.json at the repo root and a custom auto_map, transformers' AutoModelForMaskedLM.from_pretrained triggers its auto-PEFT branch: it sees the adapter, redirects the model path to Qwen/Qwen3-8B, reloads the config from there (silently dropping our auto_map) and errors with Unrecognized configuration class <Qwen3Config> for this kind of AutoModel: AutoModelForMaskedLM. Even if you keep the auto_map alive (e.g. by overriding ST's _load_config), the dynamic class lookup in auto-factory also runs against the redirected path and tries to download splade.py from Qwen/Qwen3-8B, which obviously isn't there. Working around all that from a custom Sentence Transformers module ends up coupling tightly to ST's _load_config / _load_model internals, so I went with a layout fix instead.

Moving the adapter files into lora/ short-circuits the whole problem: find_adapter_config_file returns None at the root, the auto-PEFT path doesn't fire, and our regular auto_map routing reaches splade.Qwen3ForCausalLM as expected. Qwen3ForCausalLM.from_pretrained picks up the adapter from the subfolder and assembles base + LoRA itself. No Sentence Transformers subclassing needed.

There's one extra wrinkle. PEFT's subfolder= kwarg on Windows builds the hub filename with os.path.join, which produces lora\adapter_model.safetensors; the hub's file_exists doesn't match that against the actual lora/adapter_model.safetensors, so PEFT falls back to a non-existent .bin and 404s. This is a real PEFT bug (it should use posixpath.join for hub paths), but I'd rather not block this PR on an upstream release. The workaround is to snapshot_download(repo, allow_patterns=["lora/*"]) first and then point PeftConfig.from_pretrained and PeftModel.from_pretrained at the local cached path, which sidesteps the buggy code path entirely. Local-path loads still work the same way: os.path.isdir short-circuits the download.

I missed this previously as in my own personal tests I just use SparseEncoder("."), which always worked fine: local paths skip the redirect because they pass os.path.exists. In truth, this is probably a bit of a transformers bug too, as there's not meant to be a discrepancy between local and hub loads. The fix also shouldn't be needed for the 0.6B model as it doesn't use PEFT. I'm sorry about the issue, I wasn't expecting these discrepancies.

Reproduction

I also pushed this PR to https://huggingface.co/tomaarsen/naver-splade-code-8B so that you can test this nicely:

from sentence_transformers import SparseEncoder

model = SparseEncoder("tomaarsen/naver-splade-code-8B", trust_remote_code=True)

queries = [
    "SELECT *\nFROM Student\nWHERE Age = (\nSELECT MAX(Age)\nFROM Student\nWHERE Group = 'specific_group'\n)\nAND Group = 'specific_group';"
]

query_embeddings = model.encode(queries)
print(query_embeddings.shape)
# torch.Size([1, 151936])

sparsity = model.sparsity(query_embeddings)
print(sparsity)
# {'active_dims': 1122.0, 'sparsity_ratio': 0.9926153117101938}

decoded = model.decode(query_embeddings, top_k=10)
print(decoded)
# [[('Ġgroup', 2.34375), ('Ġoldest', 2.28125), ('Ġage', 2.25), ('_group', 2.25), ('ĠGroup', 2.171875), ('ĠAge', 2.109375), ('ĠMAX', 2.0625), ('ĠStudent', 2.046875), ('Ġspecific', 2.03125), ('Ġstudent', 2.0)]]

and

from transformers import AutoModelForCausalLM
import torch

splade = AutoModelForCausalLM.from_pretrained("tomaarsen/naver-splade-code-8B", trust_remote_code=True)
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
splade.to(device)
splade.eval()
queries = [
    "SELECT *\nFROM Student\nWHERE Age = (\nSELECT MAX(Age)\nFROM Student\nWHERE Group = 'specific_group'\n)\nAND Group = 'specific_group';"
]
bow_dict = splade.encode(
    queries, prompt_type="query", top_k_q=10, return_dict=True, print_dict=True
)
'''
+--------------------------------------------------------------------+
|                        TOP ACTIVATED WORDS                         |
+--------------------------------------------------------------------+


* INPUT: SELECT *
FROM Student
WHERE Age = (
SELECT MAX(Age)
FROM Student
WHERE Group = 'specific_group'
)
AND Group = 'specific_group';

Ġgroup                    | ████████████████████ 2.34
Ġoldest                   | ███████████████████ 2.28
Ġage                      | ███████████████████ 2.25
_group                    | ███████████████████ 2.25
ĠGroup                    | ██████████████████ 2.17
ĠAge                      | ██████████████████ 2.11
ĠMAX                      | █████████████████ 2.06
ĠStudent                  | █████████████████ 2.05
Ġspecific                 | █████████████████ 2.03
Ġstudent                  | █████████████████ 2.00
'''

cc @slupart @sclincha

Tom Aarsen

Patch loading SparseEncoder from Hubebcd7f42

Move adapters back to LoRA to avoid inconvenient auto-PEFT trigger0cf3c13d

Use snapshot_download to avoid PEFT Windows issuec7dfa277

tomaarsen changed pull request status to open 6 days ago

slupart

1 day ago

Hey Tom, thank you very much, it works well with this fix for me too, on both sentence transformer and transformers.

I think we can merge.

Simon

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Ready to merge

This branch is ready to get merged automatically.

· Sign up or log in to comment