SentenceTransformer based on distilbert/distilbert-base-uncased

This is a sentence-transformers model finetuned from distilbert/distilbert-base-uncased on the wiki1m-for-simcse dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 32, 'do_lower_case': False, 'architecture': 'DistilBertModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("tomaarsen/distilbert-base-uncased-stsb-simcse")
# Run inference
sentences = [
    'He attained the rank of rear admiral (shaojiang) in July 1999, and was promoted to the rank of vice admiral (zhongjiang) in July 2006.',
    'He attained the rank of rear admiral (shaojiang) in July 1999, and was promoted to the rank of vice admiral (zhongjiang) in July 2006.',
    'Kazakh playing cards differ from the classic French deck in that it has unique non-standard suits, i.e.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[ 1.0000,  1.0000, -0.0107],
#         [ 1.0000,  1.0000, -0.0107],
#         [-0.0107, -0.0107,  1.0000]])

Evaluation

Metrics

Semantic Similarity

Metric sts-dev sts-test
pearson_cosine 0.7683 0.746
spearman_cosine 0.7736 0.7419

Training Details

Training Dataset

wiki1m-for-simcse

  • Dataset: wiki1m-for-simcse at b20d549
  • Size: 985,723 training samples
  • Columns: sentence1 and sentence2
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2
    type string string
    details
    • min: 4 tokens
    • mean: 21.73 tokens
    • max: 32 tokens
    • min: 4 tokens
    • mean: 21.73 tokens
    • max: 32 tokens
  • Samples:
    sentence1 sentence2
    YMCA in South Australia YMCA in South Australia
    South Australia (SA)  has a unique position in Australia's history as, unlike the other states which were founded as colonies, South Australia began as a self governing province Many were attracted to this and Adelaide and SA developed as an independent and free thinking state. South Australia (SA)  has a unique position in Australia's history as, unlike the other states which were founded as colonies, South Australia began as a self governing province Many were attracted to this and Adelaide and SA developed as an independent and free thinking state.
    The compound of philosophical radicalism, evangelical religion and self reliant ability typical of its founders had given an equalitarian flavour to South Australian thinking from the beginning. The compound of philosophical radicalism, evangelical religion and self reliant ability typical of its founders had given an equalitarian flavour to South Australian thinking from the beginning.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 128
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • warmup_steps: 0.1
  • fp16: True

All Hyperparameters

Click to expand
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 8
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: None
  • warmup_ratio: 0.1
  • warmup_steps: 0.1
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • enable_jit_checkpoint: False
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • use_cpu: False
  • seed: 42
  • data_seed: None
  • bf16: False
  • fp16: True
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: -1
  • ddp_backend: None
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • auto_find_batch_size: False
  • full_determinism: False
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • use_cache: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Click to expand
Epoch Step Training Loss sts-dev_spearman_cosine sts-test_spearman_cosine
-1 -1 - 0.6728 0.5705
0.0101 78 0.0417 - -
0.0203 156 0.0023 - -
0.0304 234 0.0009 - -
0.0405 312 0.0003 - -
0.0506 390 0.0003 - -
0.0608 468 0.0001 - -
0.0709 546 0.0002 - -
0.0810 624 0.0001 - -
0.0912 702 0.0000 - -
0.1001 771 - 0.7898 -
0.1013 780 0.0000 - -
0.1114 858 0.0001 - -
0.1215 936 0.0000 - -
0.1317 1014 0.0001 - -
0.1418 1092 0.0002 - -
0.1519 1170 0.0000 - -
0.1621 1248 0.0000 - -
0.1722 1326 0.0000 - -
0.1823 1404 0.0000 - -
0.1924 1482 0.0000 - -
0.2002 1542 - 0.7828 -
0.2026 1560 0.0000 - -
0.2127 1638 0.0002 - -
0.2228 1716 0.0000 - -
0.2330 1794 0.0000 - -
0.2431 1872 0.0000 - -
0.2532 1950 0.0002 - -
0.2633 2028 0.0000 - -
0.2735 2106 0.0000 - -
0.2836 2184 0.0002 - -
0.2937 2262 0.0000 - -
0.3004 2313 - 0.7998 -
0.3039 2340 0.0000 - -
0.3140 2418 0.0000 - -
0.3241 2496 0.0000 - -
0.3342 2574 0.0000 - -
0.3444 2652 0.0002 - -
0.3545 2730 0.0000 - -
0.3646 2808 0.0001 - -
0.3748 2886 0.0001 - -
0.3849 2964 0.0001 - -
0.3950 3042 0.0000 - -
0.4005 3084 - 0.7713 -
0.4051 3120 0.0000 - -
0.4153 3198 0.0004 - -
0.4254 3276 0.0000 - -
0.4355 3354 0.0001 - -
0.4457 3432 0.0002 - -
0.4558 3510 0.0000 - -
0.4659 3588 0.0002 - -
0.4760 3666 0.0002 - -
0.4862 3744 0.0000 - -
0.4963 3822 0.0000 - -
0.5006 3855 - 0.7824 -
0.5064 3900 0.0006 - -
0.5166 3978 0.0000 - -
0.5267 4056 0.0000 - -
0.5368 4134 0.0002 - -
0.5469 4212 0.0000 - -
0.5571 4290 0.0000 - -
0.5672 4368 0.0000 - -
0.5773 4446 0.0000 - -
0.5875 4524 0.0000 - -
0.5976 4602 0.0001 - -
0.6007 4626 - 0.7806 -
0.6077 4680 0.0003 - -
0.6178 4758 0.0007 - -
0.6280 4836 0.0000 - -
0.6381 4914 0.0000 - -
0.6482 4992 0.0000 - -
0.6584 5070 0.0000 - -
0.6685 5148 0.0002 - -
0.6786 5226 0.0002 - -
0.6887 5304 0.0000 - -
0.6989 5382 0.0004 - -
0.7008 5397 - 0.7509 -
0.7090 5460 0.0000 - -
0.7191 5538 0.0000 - -
0.7293 5616 0.0000 - -
0.7394 5694 0.0000 - -
0.7495 5772 0.0002 - -
0.7596 5850 0.0000 - -
0.7698 5928 0.0003 - -
0.7799 6006 0.0000 - -
0.7900 6084 0.0003 - -
0.8002 6162 0.0000 - -
0.8009 6168 - 0.7714 -
0.8103 6240 0.0000 - -
0.8204 6318 0.0000 - -
0.8305 6396 0.0000 - -
0.8407 6474 0.0001 - -
0.8508 6552 0.0000 - -
0.8609 6630 0.0000 - -
0.8711 6708 0.0000 - -
0.8812 6786 0.0003 - -
0.8913 6864 0.0002 - -
0.9011 6939 - 0.7711 -
0.9014 6942 0.0000 - -
0.9116 7020 0.0000 - -
0.9217 7098 0.0000 - -
0.9318 7176 0.0000 - -
0.9420 7254 0.0000 - -
0.9521 7332 0.0000 - -
0.9622 7410 0.0000 - -
0.9723 7488 0.0000 - -
0.9825 7566 0.0000 - -
0.9926 7644 0.0000 - -
-1 -1 - 0.7736 0.7419

Framework Versions

  • Python: 3.11.6
  • Sentence Transformers: 5.3.0.dev0
  • Transformers: 5.0.1.dev0
  • PyTorch: 2.10.0+cu126
  • Accelerate: 1.12.0
  • Datasets: 4.3.0
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
10
Safetensors
Model size
66.4M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tomaarsen/distilbert-base-uncased-stsb-simcse

Finetuned
(10970)
this model

Dataset used to train tomaarsen/distilbert-base-uncased-stsb-simcse

Papers for tomaarsen/distilbert-base-uncased-stsb-simcse

Evaluation results