SentenceTransformer based on sentence-transformers/all-distilroberta-v1

This is a sentence-transformers model finetuned from sentence-transformers/all-distilroberta-v1 on the trec dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'RobertaModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("tomaarsen/all-distilroberta-v1-trec-batch-sampler")
# Run inference
sentences = [
    "What country contains Africa 's northernmost point ?",
    'What is the difference between classical conditioning and operant conditioning ?',
    'Where can stocks be traded on-line ?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[ 1.0000, -0.2991,  0.9941],
#         [-0.2991,  1.0000, -0.2876],
#         [ 0.9941, -0.2876,  1.0000]])

Evaluation

Metrics

Triplet

Metric trec-dev trec-test
cosine_accuracy 0.9207 0.9313

Training Details

Training Dataset

trec

  • Dataset: trec at a073d2e
  • Size: 4,952 training samples
  • Columns: text and label
  • Approximate statistics based on the first 1000 samples:
    text label
    type string int
    details
    • min: 6 tokens
    • mean: 13.29 tokens
    • max: 39 tokens
    • 0: ~0.30%
    • 1: ~1.50%
    • 2: ~3.10%
    • 3: ~0.60%
    • 4: ~0.80%
    • 5: ~3.90%
    • 7: ~1.80%
    • 8: ~1.20%
    • 9: ~1.70%
    • 10: ~0.20%
    • 11: ~0.40%
    • 12: ~0.20%
    • 13: ~4.30%
    • 14: ~0.40%
    • 15: ~0.90%
    • 16: ~0.10%
    • 17: ~1.20%
    • 18: ~0.70%
    • 19: ~0.10%
    • 20: ~0.70%
    • 21: ~1.40%
    • 22: ~0.20%
    • 23: ~0.50%
    • 24: ~7.90%
    • 25: ~4.40%
    • 26: ~4.90%
    • 27: ~3.90%
    • 28: ~3.30%
    • 29: ~17.50%
    • 30: ~0.50%
    • 31: ~0.70%
    • 32: ~2.70%
    • 33: ~2.80%
    • 34: ~0.80%
    • 35: ~7.90%
    • 36: ~1.40%
    • 37: ~0.10%
    • 38: ~6.30%
    • 39: ~4.80%
    • 40: ~0.40%
    • 41: ~1.10%
    • 42: ~0.10%
    • 43: ~0.70%
    • 44: ~0.40%
    • 45: ~0.40%
    • 46: ~0.50%
    • 47: ~0.10%
    • 48: ~0.20%
  • Samples:
    text label
    How did serfdom develop in and then leave Russia ? 26
    What films featured the character Popeye Doyle ? 5
    How can I find a list of celebrities ' real names ? 26
  • Loss: BatchAllTripletLoss

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 32
  • num_train_epochs: 1
  • warmup_steps: 0.1
  • eval_strategy: steps

All Hyperparameters

Click to expand
  • per_device_train_batch_size: 32
  • num_train_epochs: 1
  • max_steps: -1
  • learning_rate: 5e-05
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: None
  • warmup_steps: 0.1
  • optim: adamw_torch_fused
  • optim_args: None
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • optim_target_modules: None
  • gradient_accumulation_steps: 1
  • average_tokens_across_devices: True
  • max_grad_norm: 1.0
  • label_smoothing_factor: 0.0
  • bf16: False
  • fp16: False
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • use_liger_kernel: False
  • liger_kernel_config: None
  • use_cache: False
  • neftune_noise_alpha: None
  • torch_empty_cache_steps: None
  • auto_find_batch_size: False
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • include_num_input_tokens_seen: no
  • log_level: passive
  • log_level_replica: warning
  • disable_tqdm: False
  • project: huggingface
  • trackio_space_id: trackio
  • eval_strategy: steps
  • per_device_eval_batch_size: 8
  • prediction_loss_only: True
  • eval_on_start: False
  • eval_do_concat_batches: True
  • eval_use_gather_object: False
  • eval_accumulation_steps: None
  • include_for_metrics: []
  • batch_eval_metrics: False
  • save_only_model: False
  • save_on_each_node: False
  • enable_jit_checkpoint: False
  • push_to_hub: False
  • hub_private_repo: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_always_push: False
  • hub_revision: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • restore_callback_states_from_checkpoint: False
  • full_determinism: False
  • seed: 42
  • data_seed: None
  • use_cpu: False
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • dataloader_prefetch_factor: None
  • remove_unused_columns: True
  • label_names: None
  • train_sampling_strategy: random
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • ddp_backend: None
  • ddp_timeout: 1800
  • fsdp: []
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • deepspeed: None
  • debug: []
  • skip_memory_metrics: True
  • do_predict: False
  • resume_from_checkpoint: None
  • warmup_ratio: None
  • local_rank: -1
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss trec-dev_cosine_accuracy trec-test_cosine_accuracy
-1 -1 - 0.7683 -
0.1032 16 4.8467 - -
0.2 31 - 0.9228 -
0.2065 32 4.3216 - -
0.3097 48 3.9460 - -
0.4 62 - 0.9248 -
0.4129 64 3.8844 - -
0.5161 80 3.9555 - -
0.6 93 - 0.9126 -
0.6194 96 3.7524 - -
0.7226 112 3.7898 - -
0.8 124 - 0.9146 -
0.8258 128 3.8515 - -
0.9290 144 3.9052 - -
1.0 155 - 0.9207 -
-1 -1 - - 0.9313

Framework Versions

  • Python: 3.11.6
  • Sentence Transformers: 5.3.0.dev0
  • Transformers: 5.3.0.dev0
  • PyTorch: 2.10.0+cu126
  • Accelerate: 1.12.0
  • Datasets: 4.3.0
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

BatchAllTripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification},
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
Downloads last month
9
Safetensors
Model size
82.1M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tomaarsen/all-distilroberta-v1-trec-batch-sampler

Finetuned
(49)
this model

Dataset used to train tomaarsen/all-distilroberta-v1-trec-batch-sampler

Papers for tomaarsen/all-distilroberta-v1-trec-batch-sampler

Evaluation results