Qwen3 base Financial

This is a sentence-transformers model finetuned from Qwen/Qwen3-Embedding-0.6B on the json dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: Qwen/Qwen3-Embedding-0.6B
  • Maximum Sequence Length: 32768 tokens
  • Output Dimensionality: 1024 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • json
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 32768, 'do_lower_case': False, 'architecture': 'Qwen3Model'})
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': True, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("PhilipCisco/qwen3-base-financial_3")
# Run inference
queries = [
    "What is Kroger\u0027s employment size and its view on human capital as of early 2023?",
]
documents = [
    'As of January 28, 2023, Kroger employed nearly 430,000 full- and part-time employees. Our people are essential to our success, and we focus intentionally on attracting, developing and engaging a diverse workforce that represents the communities we serve.',
    'NCQA reviews our compliance based on standards for quality improvement, population health management, credentialing, utilization management, network management, and member experience.',
    'Operating profit increased to $2,560.9 million in 2023 from $2,260.8 million in 2022, a 13.3% rise. The increase was predominantly due to higher gross profit, though partially offset by higher SM&A expenses.',
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 1024] [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[0.6724, 0.1635, 0.0766]])

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.7036
cosine_accuracy@3 0.8557
cosine_accuracy@5 0.8871
cosine_accuracy@10 0.9207
cosine_precision@1 0.7036
cosine_precision@3 0.2852
cosine_precision@5 0.1774
cosine_precision@10 0.0921
cosine_recall@1 0.7036
cosine_recall@3 0.8557
cosine_recall@5 0.8871
cosine_recall@10 0.9207
cosine_ndcg@10 0.8177
cosine_mrr@10 0.7841
cosine_map@100 0.7876

Training Details

Training Dataset

json

  • Dataset: json
  • Size: 5,600 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 8 tokens
    • mean: 21.15 tokens
    • max: 53 tokens
    • min: 4 tokens
    • mean: 48.73 tokens
    • max: 262 tokens
  • Samples:
    anchor positive
    What were the gains related to the fair value adjustments of investments in BVS during 2022? We recorded a $13.3 million gain on the fair value adjustment of our investment in BVS in 2022.
    What was the percentage of nonperforming consumer loans, leases and foreclosed properties as a percentage of outstanding consumer loans, leases and foreclosed properties at the end of 2023? Nonperforming consumer loans, leases and foreclosed properties as a percentage of outstanding consumer loans, leases and foreclosed properties was 0.61% at the end of 2023.
    How much did total net revenues for North America increase from fiscal 2022 to fiscal 2023? North America's total net revenues increased by $3.2 billion, from $23,370.8 million in fiscal 2022 to $26,569.6 million in fiscal 2023.
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            1024
        ],
        "matryoshka_weights": [
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 4
  • per_device_eval_batch_size: 4
  • gradient_accumulation_steps: 8
  • learning_rate: 1e-05
  • num_train_epochs: 10
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • bf16: True
  • tf32: True
  • load_best_model_at_end: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 4
  • per_device_eval_batch_size: 4
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 8
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 1e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 10
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: True
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss dim_1024_cosine_ndcg@10
-1 -1 - 0.7573
0.1143 10 0.0249 -
0.2286 20 0.0173 -
0.3429 30 0.017 -
0.4571 40 0.0273 -
0.5714 50 0.005 -
0.6857 60 0.0176 -
0.8 70 0.0184 -
0.9143 80 0.015 -
1.0 88 - 0.8178
1.0229 90 0.0145 -
1.1371 100 0.002 -
1.2514 110 0.0125 -
1.3657 120 0.0127 -
1.48 130 0.0063 -
1.5943 140 0.0087 -
1.7086 150 0.0043 -
1.8229 160 0.0099 -
1.9371 170 0.0154 -
2.0 176 - 0.8251
2.0457 180 0.0102 -
2.16 190 0.0036 -
2.2743 200 0.0229 -
2.3886 210 0.0068 -
2.5029 220 0.0071 -
2.6171 230 0.0055 -
2.7314 240 0.0043 -
2.8457 250 0.0136 -
2.96 260 0.0031 -
3.0 264 - 0.8137
3.0686 270 0.0076 -
3.1829 280 0.0047 -
3.2971 290 0.0028 -
3.4114 300 0.0037 -
3.5257 310 0.0038 -
3.64 320 0.0115 -
3.7543 330 0.0021 -
3.8686 340 0.0025 -
3.9829 350 0.0079 -
4.0 352 - 0.8177
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.12.3
  • Sentence Transformers: 5.1.0
  • Transformers: 4.56.1
  • PyTorch: 2.8.0+cu128
  • Accelerate: 1.10.1
  • Datasets: 2.19.1
  • Tokenizers: 0.22.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
2
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for PhilipCisco/qwen3-base-financial_3

Finetuned
(126)
this model

Papers for PhilipCisco/qwen3-base-financial_3

Evaluation results