Matryoshka Representation Learning
Paper
•
2205.13147
•
Published
•
25
This is a sentence-transformers model finetuned from Qwen/Qwen3-Embedding-0.6B on the json dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 32768, 'do_lower_case': False, 'architecture': 'Qwen3Model'})
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': True, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("PhilipCisco/qwen3-base-financial_3")
# Run inference
queries = [
"What is Kroger\u0027s employment size and its view on human capital as of early 2023?",
]
documents = [
'As of January 28, 2023, Kroger employed nearly 430,000 full- and part-time employees. Our people are essential to our success, and we focus intentionally on attracting, developing and engaging a diverse workforce that represents the communities we serve.',
'NCQA reviews our compliance based on standards for quality improvement, population health management, credentialing, utilization management, network management, and member experience.',
'Operating profit increased to $2,560.9 million in 2023 from $2,260.8 million in 2022, a 13.3% rise. The increase was predominantly due to higher gross profit, though partially offset by higher SM&A expenses.',
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 1024] [3, 1024]
# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[0.6724, 0.1635, 0.0766]])
dim_1024InformationRetrievalEvaluator with these parameters:{
"truncate_dim": 1024
}
| Metric | Value |
|---|---|
| cosine_accuracy@1 | 0.7036 |
| cosine_accuracy@3 | 0.8557 |
| cosine_accuracy@5 | 0.8871 |
| cosine_accuracy@10 | 0.9207 |
| cosine_precision@1 | 0.7036 |
| cosine_precision@3 | 0.2852 |
| cosine_precision@5 | 0.1774 |
| cosine_precision@10 | 0.0921 |
| cosine_recall@1 | 0.7036 |
| cosine_recall@3 | 0.8557 |
| cosine_recall@5 | 0.8871 |
| cosine_recall@10 | 0.9207 |
| cosine_ndcg@10 | 0.8177 |
| cosine_mrr@10 | 0.7841 |
| cosine_map@100 | 0.7876 |
anchor and positive| anchor | positive | |
|---|---|---|
| type | string | string |
| details |
|
|
| anchor | positive |
|---|---|
What were the gains related to the fair value adjustments of investments in BVS during 2022? |
We recorded a $13.3 million gain on the fair value adjustment of our investment in BVS in 2022. |
What was the percentage of nonperforming consumer loans, leases and foreclosed properties as a percentage of outstanding consumer loans, leases and foreclosed properties at the end of 2023? |
Nonperforming consumer loans, leases and foreclosed properties as a percentage of outstanding consumer loans, leases and foreclosed properties was 0.61% at the end of 2023. |
How much did total net revenues for North America increase from fiscal 2022 to fiscal 2023? |
North America's total net revenues increased by $3.2 billion, from $23,370.8 million in fiscal 2022 to $26,569.6 million in fiscal 2023. |
MatryoshkaLoss with these parameters:{
"loss": "MultipleNegativesRankingLoss",
"matryoshka_dims": [
1024
],
"matryoshka_weights": [
1
],
"n_dims_per_step": -1
}
eval_strategy: epochper_device_train_batch_size: 4per_device_eval_batch_size: 4gradient_accumulation_steps: 8learning_rate: 1e-05num_train_epochs: 10lr_scheduler_type: cosinewarmup_ratio: 0.1bf16: Truetf32: Trueload_best_model_at_end: Truebatch_sampler: no_duplicatesoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: epochprediction_loss_only: Trueper_device_train_batch_size: 4per_device_eval_batch_size: 4per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 8eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 1e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 10max_steps: -1lr_scheduler_type: cosinelr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Truefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Truelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Trueignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedeepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torch_fusedoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsehub_revision: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseliger_kernel_config: Noneeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportionalrouter_mapping: {}learning_rate_mapping: {}| Epoch | Step | Training Loss | dim_1024_cosine_ndcg@10 |
|---|---|---|---|
| -1 | -1 | - | 0.7573 |
| 0.1143 | 10 | 0.0249 | - |
| 0.2286 | 20 | 0.0173 | - |
| 0.3429 | 30 | 0.017 | - |
| 0.4571 | 40 | 0.0273 | - |
| 0.5714 | 50 | 0.005 | - |
| 0.6857 | 60 | 0.0176 | - |
| 0.8 | 70 | 0.0184 | - |
| 0.9143 | 80 | 0.015 | - |
| 1.0 | 88 | - | 0.8178 |
| 1.0229 | 90 | 0.0145 | - |
| 1.1371 | 100 | 0.002 | - |
| 1.2514 | 110 | 0.0125 | - |
| 1.3657 | 120 | 0.0127 | - |
| 1.48 | 130 | 0.0063 | - |
| 1.5943 | 140 | 0.0087 | - |
| 1.7086 | 150 | 0.0043 | - |
| 1.8229 | 160 | 0.0099 | - |
| 1.9371 | 170 | 0.0154 | - |
| 2.0 | 176 | - | 0.8251 |
| 2.0457 | 180 | 0.0102 | - |
| 2.16 | 190 | 0.0036 | - |
| 2.2743 | 200 | 0.0229 | - |
| 2.3886 | 210 | 0.0068 | - |
| 2.5029 | 220 | 0.0071 | - |
| 2.6171 | 230 | 0.0055 | - |
| 2.7314 | 240 | 0.0043 | - |
| 2.8457 | 250 | 0.0136 | - |
| 2.96 | 260 | 0.0031 | - |
| 3.0 | 264 | - | 0.8137 |
| 3.0686 | 270 | 0.0076 | - |
| 3.1829 | 280 | 0.0047 | - |
| 3.2971 | 290 | 0.0028 | - |
| 3.4114 | 300 | 0.0037 | - |
| 3.5257 | 310 | 0.0038 | - |
| 3.64 | 320 | 0.0115 | - |
| 3.7543 | 330 | 0.0021 | - |
| 3.8686 | 340 | 0.0025 | - |
| 3.9829 | 350 | 0.0079 | - |
| 4.0 | 352 | - | 0.8177 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}