Patch loading SparseEncoder from Hub
Hello!
Pull Request overview
- Fix
SparseEncoder("naver/splade-code-8B", ...)failing on hub loads withUnrecognized configuration class ... for this kind of AutoModel: AutoModelForMaskedLM - Move the LoRA adapter back into a
lora/subfolder so transformers' auto-PEFT path doesn't fire on this repo - Pre-download the
lora/subfolder inQwen3ForCausalLM.from_pretrainedto dodge a Windows path-join bug in PEFT
Details
This bug ended up taking a moment to chase down. The integration that landed previously works for local paths but breaks on hub loads, which is rather frustrating. With adapter_config.json at the repo root and a custom auto_map, transformers' AutoModelForMaskedLM.from_pretrained triggers its auto-PEFT branch: it sees the adapter, redirects the model path to Qwen/Qwen3-8B, reloads the config from there (silently dropping our auto_map) and errors with Unrecognized configuration class <Qwen3Config> for this kind of AutoModel: AutoModelForMaskedLM. Even if you keep the auto_map alive (e.g. by overriding ST's _load_config), the dynamic class lookup in auto-factory also runs against the redirected path and tries to download splade.py from Qwen/Qwen3-8B, which obviously isn't there. Working around all that from a custom Sentence Transformers module ends up coupling tightly to ST's _load_config / _load_model internals, so I went with a layout fix instead.
Moving the adapter files into lora/ short-circuits the whole problem: find_adapter_config_file returns None at the root, the auto-PEFT path doesn't fire, and our regular auto_map routing reaches splade.Qwen3ForCausalLM as expected. Qwen3ForCausalLM.from_pretrained picks up the adapter from the subfolder and assembles base + LoRA itself. No Sentence Transformers subclassing needed.
There's one extra wrinkle. PEFT's subfolder= kwarg on Windows builds the hub filename with os.path.join, which produces lora\adapter_model.safetensors; the hub's file_exists doesn't match that against the actual lora/adapter_model.safetensors, so PEFT falls back to a non-existent .bin and 404s. This is a real PEFT bug (it should use posixpath.join for hub paths), but I'd rather not block this PR on an upstream release. The workaround is to snapshot_download(repo, allow_patterns=["lora/*"]) first and then point PeftConfig.from_pretrained and PeftModel.from_pretrained at the local cached path, which sidesteps the buggy code path entirely. Local-path loads still work the same way: os.path.isdir short-circuits the download.
I missed this previously as in my own personal tests I just use SparseEncoder("."), which always worked fine: local paths skip the redirect because they pass os.path.exists. In truth, this is probably a bit of a transformers bug too, as there's not meant to be a discrepancy between local and hub loads. The fix also shouldn't be needed for the 0.6B model as it doesn't use PEFT. I'm sorry about the issue, I wasn't expecting these discrepancies.
Reproduction
I also pushed this PR to https://huggingface.co/tomaarsen/naver-splade-code-8B so that you can test this nicely:
from sentence_transformers import SparseEncoder
model = SparseEncoder("tomaarsen/naver-splade-code-8B", trust_remote_code=True)
queries = [
"SELECT *\nFROM Student\nWHERE Age = (\nSELECT MAX(Age)\nFROM Student\nWHERE Group = 'specific_group'\n)\nAND Group = 'specific_group';"
]
query_embeddings = model.encode(queries)
print(query_embeddings.shape)
# torch.Size([1, 151936])
sparsity = model.sparsity(query_embeddings)
print(sparsity)
# {'active_dims': 1122.0, 'sparsity_ratio': 0.9926153117101938}
decoded = model.decode(query_embeddings, top_k=10)
print(decoded)
# [[('Δ group', 2.34375), ('Δ oldest', 2.28125), ('Δ age', 2.25), ('_group', 2.25), ('Δ Group', 2.171875), ('Δ Age', 2.109375), ('Δ MAX', 2.0625), ('Δ Student', 2.046875), ('Δ specific', 2.03125), ('Δ student', 2.0)]]
and
from transformers import AutoModelForCausalLM
import torch
splade = AutoModelForCausalLM.from_pretrained("tomaarsen/naver-splade-code-8B", trust_remote_code=True)
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
splade.to(device)
splade.eval()
queries = [
"SELECT *\nFROM Student\nWHERE Age = (\nSELECT MAX(Age)\nFROM Student\nWHERE Group = 'specific_group'\n)\nAND Group = 'specific_group';"
]
bow_dict = splade.encode(
queries, prompt_type="query", top_k_q=10, return_dict=True, print_dict=True
)
'''
+--------------------------------------------------------------------+
| TOP ACTIVATED WORDS |
+--------------------------------------------------------------------+
* INPUT: SELECT *
FROM Student
WHERE Age = (
SELECT MAX(Age)
FROM Student
WHERE Group = 'specific_group'
)
AND Group = 'specific_group';
Δ group | ββββββββββββββββββββ 2.34
Δ oldest | βββββββββββββββββββ 2.28
Δ age | βββββββββββββββββββ 2.25
_group | βββββββββββββββββββ 2.25
Δ Group | ββββββββββββββββββ 2.17
Δ Age | ββββββββββββββββββ 2.11
Δ MAX | βββββββββββββββββ 2.06
Δ Student | βββββββββββββββββ 2.05
Δ specific | βββββββββββββββββ 2.03
Δ student | βββββββββββββββββ 2.00
'''
- Tom Aarsen
Hey Tom, thank you very much, it works well with this fix for me too, on both sentence transformer and transformers.
I think we can merge.
- Simon