Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper
• 1908.10084 • Published
• 12
This is a sentence-transformers model finetuned from distilbert/distilbert-base-uncased on the wiki1m-for-simcse dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 32, 'do_lower_case': False, 'architecture': 'DistilBertModel'})
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("tomaarsen/distilbert-base-uncased-stsb-simcse")
# Run inference
sentences = [
'He attained the rank of rear admiral (shaojiang) in July 1999, and was promoted to the rank of vice admiral (zhongjiang) in July 2006.',
'He attained the rank of rear admiral (shaojiang) in July 1999, and was promoted to the rank of vice admiral (zhongjiang) in July 2006.',
'Kazakh playing cards differ from the classic French deck in that it has unique non-standard suits, i.e.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[ 1.0000, 1.0000, -0.0107],
# [ 1.0000, 1.0000, -0.0107],
# [-0.0107, -0.0107, 1.0000]])
sts-dev and sts-testEmbeddingSimilarityEvaluator| Metric | sts-dev | sts-test |
|---|---|---|
| pearson_cosine | 0.7683 | 0.746 |
| spearman_cosine | 0.7736 | 0.7419 |
sentence1 and sentence2| sentence1 | sentence2 | |
|---|---|---|
| type | string | string |
| details |
|
|
| sentence1 | sentence2 |
|---|---|
YMCA in South Australia |
YMCA in South Australia |
South Australia (SA) has a unique position in Australia's history as, unlike the other states which were founded as colonies, South Australia began as a self governing province Many were attracted to this and Adelaide and SA developed as an independent and free thinking state. |
South Australia (SA) has a unique position in Australia's history as, unlike the other states which were founded as colonies, South Australia began as a self governing province Many were attracted to this and Adelaide and SA developed as an independent and free thinking state. |
The compound of philosophical radicalism, evangelical religion and self reliant ability typical of its founders had given an equalitarian flavour to South Australian thinking from the beginning. |
The compound of philosophical radicalism, evangelical religion and self reliant ability typical of its founders had given an equalitarian flavour to South Australian thinking from the beginning. |
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim",
"gather_across_devices": false
}
eval_strategy: stepsper_device_train_batch_size: 128num_train_epochs: 1warmup_ratio: 0.1warmup_steps: 0.1fp16: Truedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 128per_device_eval_batch_size: 8gradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 1max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: Nonewarmup_ratio: 0.1warmup_steps: 0.1log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Trueenable_jit_checkpoint: Falsesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseuse_cpu: Falseseed: 42data_seed: Nonebf16: Falsefp16: Truebf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: -1ddp_backend: Nonedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonedisable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedeepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torch_fusedoptim_args: Nonegroup_by_length: Falselength_column_name: lengthproject: huggingfacetrackio_space_id: trackioddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Truepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsehub_revision: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_for_metrics: []eval_do_concat_batches: Trueauto_find_batch_size: Falsefull_determinism: Falseddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_num_input_tokens_seen: noneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseliger_kernel_config: Noneeval_use_gather_object: Falseaverage_tokens_across_devices: Trueuse_cache: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: proportionalrouter_mapping: {}learning_rate_mapping: {}| Epoch | Step | Training Loss | sts-dev_spearman_cosine | sts-test_spearman_cosine |
|---|---|---|---|---|
| -1 | -1 | - | 0.6728 | 0.5705 |
| 0.0101 | 78 | 0.0417 | - | - |
| 0.0203 | 156 | 0.0023 | - | - |
| 0.0304 | 234 | 0.0009 | - | - |
| 0.0405 | 312 | 0.0003 | - | - |
| 0.0506 | 390 | 0.0003 | - | - |
| 0.0608 | 468 | 0.0001 | - | - |
| 0.0709 | 546 | 0.0002 | - | - |
| 0.0810 | 624 | 0.0001 | - | - |
| 0.0912 | 702 | 0.0000 | - | - |
| 0.1001 | 771 | - | 0.7898 | - |
| 0.1013 | 780 | 0.0000 | - | - |
| 0.1114 | 858 | 0.0001 | - | - |
| 0.1215 | 936 | 0.0000 | - | - |
| 0.1317 | 1014 | 0.0001 | - | - |
| 0.1418 | 1092 | 0.0002 | - | - |
| 0.1519 | 1170 | 0.0000 | - | - |
| 0.1621 | 1248 | 0.0000 | - | - |
| 0.1722 | 1326 | 0.0000 | - | - |
| 0.1823 | 1404 | 0.0000 | - | - |
| 0.1924 | 1482 | 0.0000 | - | - |
| 0.2002 | 1542 | - | 0.7828 | - |
| 0.2026 | 1560 | 0.0000 | - | - |
| 0.2127 | 1638 | 0.0002 | - | - |
| 0.2228 | 1716 | 0.0000 | - | - |
| 0.2330 | 1794 | 0.0000 | - | - |
| 0.2431 | 1872 | 0.0000 | - | - |
| 0.2532 | 1950 | 0.0002 | - | - |
| 0.2633 | 2028 | 0.0000 | - | - |
| 0.2735 | 2106 | 0.0000 | - | - |
| 0.2836 | 2184 | 0.0002 | - | - |
| 0.2937 | 2262 | 0.0000 | - | - |
| 0.3004 | 2313 | - | 0.7998 | - |
| 0.3039 | 2340 | 0.0000 | - | - |
| 0.3140 | 2418 | 0.0000 | - | - |
| 0.3241 | 2496 | 0.0000 | - | - |
| 0.3342 | 2574 | 0.0000 | - | - |
| 0.3444 | 2652 | 0.0002 | - | - |
| 0.3545 | 2730 | 0.0000 | - | - |
| 0.3646 | 2808 | 0.0001 | - | - |
| 0.3748 | 2886 | 0.0001 | - | - |
| 0.3849 | 2964 | 0.0001 | - | - |
| 0.3950 | 3042 | 0.0000 | - | - |
| 0.4005 | 3084 | - | 0.7713 | - |
| 0.4051 | 3120 | 0.0000 | - | - |
| 0.4153 | 3198 | 0.0004 | - | - |
| 0.4254 | 3276 | 0.0000 | - | - |
| 0.4355 | 3354 | 0.0001 | - | - |
| 0.4457 | 3432 | 0.0002 | - | - |
| 0.4558 | 3510 | 0.0000 | - | - |
| 0.4659 | 3588 | 0.0002 | - | - |
| 0.4760 | 3666 | 0.0002 | - | - |
| 0.4862 | 3744 | 0.0000 | - | - |
| 0.4963 | 3822 | 0.0000 | - | - |
| 0.5006 | 3855 | - | 0.7824 | - |
| 0.5064 | 3900 | 0.0006 | - | - |
| 0.5166 | 3978 | 0.0000 | - | - |
| 0.5267 | 4056 | 0.0000 | - | - |
| 0.5368 | 4134 | 0.0002 | - | - |
| 0.5469 | 4212 | 0.0000 | - | - |
| 0.5571 | 4290 | 0.0000 | - | - |
| 0.5672 | 4368 | 0.0000 | - | - |
| 0.5773 | 4446 | 0.0000 | - | - |
| 0.5875 | 4524 | 0.0000 | - | - |
| 0.5976 | 4602 | 0.0001 | - | - |
| 0.6007 | 4626 | - | 0.7806 | - |
| 0.6077 | 4680 | 0.0003 | - | - |
| 0.6178 | 4758 | 0.0007 | - | - |
| 0.6280 | 4836 | 0.0000 | - | - |
| 0.6381 | 4914 | 0.0000 | - | - |
| 0.6482 | 4992 | 0.0000 | - | - |
| 0.6584 | 5070 | 0.0000 | - | - |
| 0.6685 | 5148 | 0.0002 | - | - |
| 0.6786 | 5226 | 0.0002 | - | - |
| 0.6887 | 5304 | 0.0000 | - | - |
| 0.6989 | 5382 | 0.0004 | - | - |
| 0.7008 | 5397 | - | 0.7509 | - |
| 0.7090 | 5460 | 0.0000 | - | - |
| 0.7191 | 5538 | 0.0000 | - | - |
| 0.7293 | 5616 | 0.0000 | - | - |
| 0.7394 | 5694 | 0.0000 | - | - |
| 0.7495 | 5772 | 0.0002 | - | - |
| 0.7596 | 5850 | 0.0000 | - | - |
| 0.7698 | 5928 | 0.0003 | - | - |
| 0.7799 | 6006 | 0.0000 | - | - |
| 0.7900 | 6084 | 0.0003 | - | - |
| 0.8002 | 6162 | 0.0000 | - | - |
| 0.8009 | 6168 | - | 0.7714 | - |
| 0.8103 | 6240 | 0.0000 | - | - |
| 0.8204 | 6318 | 0.0000 | - | - |
| 0.8305 | 6396 | 0.0000 | - | - |
| 0.8407 | 6474 | 0.0001 | - | - |
| 0.8508 | 6552 | 0.0000 | - | - |
| 0.8609 | 6630 | 0.0000 | - | - |
| 0.8711 | 6708 | 0.0000 | - | - |
| 0.8812 | 6786 | 0.0003 | - | - |
| 0.8913 | 6864 | 0.0002 | - | - |
| 0.9011 | 6939 | - | 0.7711 | - |
| 0.9014 | 6942 | 0.0000 | - | - |
| 0.9116 | 7020 | 0.0000 | - | - |
| 0.9217 | 7098 | 0.0000 | - | - |
| 0.9318 | 7176 | 0.0000 | - | - |
| 0.9420 | 7254 | 0.0000 | - | - |
| 0.9521 | 7332 | 0.0000 | - | - |
| 0.9622 | 7410 | 0.0000 | - | - |
| 0.9723 | 7488 | 0.0000 | - | - |
| 0.9825 | 7566 | 0.0000 | - | - |
| 0.9926 | 7644 | 0.0000 | - | - |
| -1 | -1 | - | 0.7736 | 0.7419 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Base model
distilbert/distilbert-base-uncased