CrossEncoder based on microsoft/Multilingual-MiniLM-L12-H384

This is a Cross Encoder model finetuned from microsoft/Multilingual-MiniLM-L12-H384 on the msmarco dataset using the sentence-transformers library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.

Model Details

Model Description

Model Sources

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import CrossEncoder

# Download from the 🤗 Hub
model = CrossEncoder("hasankursun/reranker-Multilingual-MiniLM-L12-H384-msmarco-mse")
# Get scores for pairs of texts
pairs = [
    ['what who did dr nassar molest', 'Dr. Larry Nassar, who is 54 years old, has been in prison while awaiting sentencing. During that time, McKayla Maroney and Aly Raisman revealed that Dr. Nassar had molested them, sharing their heartbreaking #MeToo stories with the world. Like so many others, these Olympic superstars describe Dr. Nassar, at the time their team doctor, administering treatments to them under the guise of medical practice.'],
    ['average monthly temperature in vancouver', 'Vancouver: Annual Weather Averages. August is the hottest month in Vancouver with an average temperature of 18°C (64°F) and the coldest is January at 4°C (38°F) with the most daily sunshine hours at 13 in July.The wettest month is November with an average of 200mm of rain.The best month to swim in the sea is in July when the average sea temperature is 14°C (57°F).he wettest month is November with an average of 200mm of rain. The best month to swim in the sea is in July when the average sea temperature is 14°C (57°F).'],
    ['do i have to file a tax return for my business if i dont have any income', 'The responsibility for filing your childâ\x80\x99s tax return rests with your child if he is capable of doing so. If he is not old enough to understand how to prepare a tax return, then it becomes your responsibility to file it for him or to include his income on your return.he responsibility for filing your childâ\x80\x99s tax return rests with your child if he is capable of doing so. If he is not old enough to understand how to prepare a tax return, then it becomes your responsibility to file it for him or to include his income on your return.'],
    ['what happens to your growth plate when you finish growing', 'Over the past 70 million years, the combined processes of magma formation, volcano eruption and growth, and continued movement of the Pacific Plate over the stationary Hawaiian hot-spot have left a long trail of volcanoes across the Pacific Ocean floor.s the Pacific Plate continues to move west-northwest, the Island of Hawaii will be carried beyond the hotspot by plate motion, setting the stage for the formation of a new volcanic island in its place.'],
    ['average cost of holiday to new zealand', 'Definition: Average total cost is the sum of all the production costs divided by the number of units produced. Terms related to Average Total Cost: 1  The Cost Curve. 2  Average Propensity to Consume.'],
]
scores = model.predict(pairs)
print(scores.shape)
# (5,)

# Or rank different texts based on similarity to a single text
ranks = model.rank(
    'what who did dr nassar molest',
    [
        'Dr. Larry Nassar, who is 54 years old, has been in prison while awaiting sentencing. During that time, McKayla Maroney and Aly Raisman revealed that Dr. Nassar had molested them, sharing their heartbreaking #MeToo stories with the world. Like so many others, these Olympic superstars describe Dr. Nassar, at the time their team doctor, administering treatments to them under the guise of medical practice.',
        'Vancouver: Annual Weather Averages. August is the hottest month in Vancouver with an average temperature of 18°C (64°F) and the coldest is January at 4°C (38°F) with the most daily sunshine hours at 13 in July.The wettest month is November with an average of 200mm of rain.The best month to swim in the sea is in July when the average sea temperature is 14°C (57°F).he wettest month is November with an average of 200mm of rain. The best month to swim in the sea is in July when the average sea temperature is 14°C (57°F).',
        'The responsibility for filing your childâ\x80\x99s tax return rests with your child if he is capable of doing so. If he is not old enough to understand how to prepare a tax return, then it becomes your responsibility to file it for him or to include his income on your return.he responsibility for filing your childâ\x80\x99s tax return rests with your child if he is capable of doing so. If he is not old enough to understand how to prepare a tax return, then it becomes your responsibility to file it for him or to include his income on your return.',
        'Over the past 70 million years, the combined processes of magma formation, volcano eruption and growth, and continued movement of the Pacific Plate over the stationary Hawaiian hot-spot have left a long trail of volcanoes across the Pacific Ocean floor.s the Pacific Plate continues to move west-northwest, the Island of Hawaii will be carried beyond the hotspot by plate motion, setting the stage for the formation of a new volcanic island in its place.',
        'Definition: Average total cost is the sum of all the production costs divided by the number of units produced. Terms related to Average Total Cost: 1  The Cost Curve. 2  Average Propensity to Consume.',
    ]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]

Evaluation

Metrics

Cross Encoder Reranking

  • Datasets: NanoMSMARCO_R100, NanoNFCorpus_R100 and NanoNQ_R100
  • Evaluated with CrossEncoderRerankingEvaluator with these parameters:
    {
        "at_k": 10,
        "always_rerank_positives": true
    }
    
Metric NanoMSMARCO_R100 NanoNFCorpus_R100 NanoNQ_R100
map 0.5969 (+0.1073) 0.3452 (+0.0842) 0.6475 (+0.2279)
mrr@10 0.5916 (+0.1141) 0.5892 (+0.0893) 0.6618 (+0.2351)
ndcg@10 0.6640 (+0.1236) 0.3925 (+0.0674) 0.7035 (+0.2028)

Cross Encoder Nano BEIR

  • Dataset: NanoBEIR_R100_mean
  • Evaluated with CrossEncoderNanoBEIREvaluator with these parameters:
    {
        "dataset_names": [
            "msmarco",
            "nfcorpus",
            "nq"
        ],
        "rerank_k": 100,
        "at_k": 10,
        "always_rerank_positives": true
    }
    
Metric Value
map 0.5299 (+0.1398)
mrr@10 0.6142 (+0.1462)
ndcg@10 0.5867 (+0.1313)

Training Details

Training Dataset

msmarco

  • Dataset: msmarco at 9e329ed
  • Size: 1,990,000 training samples
  • Columns: score, query, and passage
  • Approximate statistics based on the first 1000 samples:
    score query passage
    type float string string
    details
    • min: -11.82
    • mean: 0.84
    • max: 10.86
    • min: 10 characters
    • mean: 33.38 characters
    • max: 91 characters
    • min: 57 characters
    • mean: 346.19 characters
    • max: 916 characters
  • Samples:
    score query passage
    -1.2796382506688435 what hormone is released after a meal This conduction of bile is the main function of the common bile duct. The hormone cholecystokinin, when stimulated by a fatty meal, promotes bile secretion by increased production of hepatic bile, contraction of the gall bladder, and relaxation of the Sphincter of Oddi.
    5.993939399719238 what can cause fluid around the heart and in the lungs But in certain circumstances, the alveoli fill with fluid instead of air, preventing oxygen from being absorbed into your bloodstream. A number of things can cause fluid to accumulate in your lungs, but most have to do with your heart (cardiogenic pulmonary edema).
    -3.6289062102635703 which war did augustus underwood serve in? Quick Answer. Caesar Augustus, also known as Octavian, was the first Roman emperor after the assassination of Julius Caesar in 43 B.C. Augustus was Caesar's grand nephew who shrewdly combined lawmaking, military might and institutional building to create the foundations of the 200-year Pax Romana, explains Biography.com.
  • Loss: MSELoss with these parameters:
    {
        "activation_fn": "torch.nn.modules.linear.Identity"
    }
    

Evaluation Dataset

msmarco

  • Dataset: msmarco at 9e329ed
  • Size: 10,000 evaluation samples
  • Columns: score, query, and passage
  • Approximate statistics based on the first 1000 samples:
    score query passage
    type float string string
    details
    • min: -11.82
    • mean: 0.75
    • max: 11.1
    • min: 10 characters
    • mean: 33.5 characters
    • max: 173 characters
    • min: 50 characters
    • mean: 350.19 characters
    • max: 1175 characters
  • Samples:
    score query passage
    7.531805515289307 what who did dr nassar molest Dr. Larry Nassar, who is 54 years old, has been in prison while awaiting sentencing. During that time, McKayla Maroney and Aly Raisman revealed that Dr. Nassar had molested them, sharing their heartbreaking #MeToo stories with the world. Like so many others, these Olympic superstars describe Dr. Nassar, at the time their team doctor, administering treatments to them under the guise of medical practice.
    8.721138636271158 average monthly temperature in vancouver Vancouver: Annual Weather Averages. August is the hottest month in Vancouver with an average temperature of 18°C (64°F) and the coldest is January at 4°C (38°F) with the most daily sunshine hours at 13 in July.The wettest month is November with an average of 200mm of rain.The best month to swim in the sea is in July when the average sea temperature is 14°C (57°F).he wettest month is November with an average of 200mm of rain. The best month to swim in the sea is in July when the average sea temperature is 14°C (57°F).
    -2.6914831002553306 do i have to file a tax return for my business if i dont have any income The responsibility for filing your child’s tax return rests with your child if he is capable of doing so. If he is not old enough to understand how to prepare a tax return, then it becomes your responsibility to file it for him or to include his income on your return.he responsibility for filing your child’s tax return rests with your child if he is capable of doing so. If he is not old enough to understand how to prepare a tax return, then it becomes your responsibility to file it for him or to include his income on your return.
  • Loss: MSELoss with these parameters:
    {
        "activation_fn": "torch.nn.modules.linear.Identity"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • learning_rate: 8e-06
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • seed: 12
  • bf16: True
  • dataloader_num_workers: 4
  • load_best_model_at_end: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 8e-06
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 12
  • data_seed: None
  • jit_mode_eval: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 4
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss Validation Loss NanoMSMARCO_R100_ndcg@10 NanoNFCorpus_R100_ndcg@10 NanoNQ_R100_ndcg@10 NanoBEIR_R100_mean_ndcg@10
-1 -1 - - 0.0412 (-0.4992) 0.2383 (-0.0867) 0.0468 (-0.4538) 0.1088 (-0.3466)
0.0000 1 66.3043 - - - - -
0.0322 4000 51.8043 - - - - -
0.0643 8000 19.8202 - - - - -
0.0965 12000 8.3776 - - - - -
0.1286 16000 5.8256 - - - - -
0.1608 20000 4.9058 4.1738 0.6666 (+0.1262) 0.3713 (+0.0463) 0.6888 (+0.1882) 0.5756 (+0.1202)
0.1930 24000 4.4046 - - - - -
0.2251 28000 4.0191 - - - - -
0.2573 32000 3.8165 - - - - -
0.2894 36000 3.5786 - - - - -
0.3216 40000 3.484 3.3411 0.6856 (+0.1451) 0.3848 (+0.0597) 0.6858 (+0.1852) 0.5854 (+0.1300)
0.3538 44000 3.3467 - - - - -
0.3859 48000 3.1952 - - - - -
0.4181 52000 3.1518 - - - - -
0.4503 56000 3.0006 - - - - -
0.4824 60000 3.0028 2.9331 0.6609 (+0.1204) 0.3894 (+0.0643) 0.7033 (+0.2027) 0.5845 (+0.1292)
0.5146 64000 2.8912 - - - - -
0.5467 68000 2.8112 - - - - -
0.5789 72000 2.7707 - - - - -
0.6111 76000 2.7566 - - - - -
0.6432 80000 2.667 2.6270 0.6603 (+0.1199) 0.3967 (+0.0717) 0.6790 (+0.1784) 0.5787 (+0.1233)
0.6754 84000 2.6895 - - - - -
0.7075 88000 2.6151 - - - - -
0.7397 92000 2.558 - - - - -
0.7719 96000 2.5346 - - - - -
0.804 100000 2.5255 2.3877 0.6640 (+0.1236) 0.3925 (+0.0674) 0.7035 (+0.2028) 0.5867 (+0.1313)
0.8362 104000 2.488 - - - - -
0.8683 108000 2.5009 - - - - -
0.9005 112000 2.4195 - - - - -
0.9327 116000 2.4237 - - - - -
0.9648 120000 2.4671 2.3669 0.6640 (+0.1236) 0.3955 (+0.0705) 0.6877 (+0.1871) 0.5824 (+0.1270)
0.9970 124000 2.3769 - - - - -
-1 -1 - - 0.6640 (+0.1236) 0.3925 (+0.0674) 0.7035 (+0.2028) 0.5867 (+0.1313)
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.12.3
  • Sentence Transformers: 5.1.2
  • Transformers: 4.57.1
  • PyTorch: 2.9.1+cu128
  • Accelerate: 1.11.0
  • Datasets: 4.4.1
  • Tokenizers: 0.22.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
1
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for hasankursun/reranker-Multilingual-MiniLM-L12-H384-msmarco-mse

Finetuned
(32)
this model

Dataset used to train hasankursun/reranker-Multilingual-MiniLM-L12-H384-msmarco-mse

Paper for hasankursun/reranker-Multilingual-MiniLM-L12-H384-msmarco-mse

Evaluation results