Bobo — Embedding Model (derived from mixedbread-ai/mxbai-embed-large-v1)

Bobo is a text-embedding, LLM derived from mixedbread-ai/mxbai-embed-large-v1, packaged for drop-in use in semantic search, RAG (Retrieval-Augmented Generation), clustering, deduplication, and zero-shot classification. It produces dense vector representations suitable for ANN indexes (FAISS, ScaNN, Milvus, pgvector) and common retrieval stacks.

Key Features

Derived from the proven mixedbread-ai/mxbai-embed-large-v1 embedding family.
Strong performance on retrieval-style tasks (semantic search, RAG, clustering).
Sentence-Transformers–compatible API for fast adoption.
Mean pooling with optional L2 normalization for cosine similarity workflows.
Production-friendly guidance on chunking, batching, and indexing.

Technical Specifications

Property	Value / Guidance
Base model	mixedbread-ai/mxbai-embed-large-v1 (encoder-style transformer)
Architecture	Transformer encoder (Sentence-Transformers compatible)
Embedding dimension	Query programmatically at runtime; common builds use 1024
Tokenization	Provided by upstream model tokenizer
Max input length	Depends on upstream config; chunk long docs (e.g., 256–512 tokens)
Pooling	Mean pooling (recommended), then optional L2 normalization
Output	Dense float vectors (often normalized if using cosine similarity)
Intended backends	FAISS, Milvus, pgvector, Qdrant, Chroma, Weaviate

Tip: Always detect the dimension in code (see Quickstart) and configure your index accordingly.

🎯 Quickstart

Here, we provide several ways to produce sentence embeddings. Please note that you have to provide the prompt Represent this sentence for searching relevant passages: for query if you want to use it for retrieval. Besides that you don't need any prompt.

⚙️ Vectorized Datasets

Vectorization is the process of converting textual data into numerical vectors and is a process that is usually applied once the text is cleaned. It can help improve the execution speed and reduce the training time of your code. BudgetPy provides the following vector stores on the OpenAI platform to support environmental data analysis with machine-learning

Appropriations - Enacted appropriations from 1996-2024 available for fine-tuning learning models
Regulations - Collection of federal regulations on the use of appropriated funds
SF-133 - The Report on Budget Execution and Budgetary Resources
Balances - U.S. federal agency Account Balances (File A) submitted as part of the DATA Act 2014.
Outlays - The actual disbursements of funds by the U.S. federal government from 1962 to 2025
SF-133 The Report on Budget Execution and Budgetary Resources
Balances - U.S. federal agency Account Balances (File A) submitted as part of the DATA Act 2014.
Circular A11 - Guidance from OMB on the preparation, submission, and execution of the federal budget
Fastbook - Treasury guidance on federal ledger accounts
Title 31 CFR - Money & Finance
Redbook - The Principles of Appropriations Law (Volumes I & II).
US Standard General Ledger - Account Definitions
Treasury Appropriation Fund Symbols (TAFSs) Dataset - Collection of TAFSs used by federal agencies

🏗️ Sentence Transformers

python -m pip install -U sentence-transformers

from sentence_transformers import SentenceTransformer
from sentence_transformers.util import cos_sim
from sentence_transformers.quantization import quantize_embeddings

# 1. Specify preffered dimensions
dimensions = 512

# 2. load model
model = SentenceTransformer("leeroy-jankins/bobo-embed-large-v1", truncate_dim=dimensions)

# The prompt used for query retrieval tasks:
# query_prompt = 'Represent this sentence for searching relevant passages: '

query = "A man is eating a piece of bread"
docs = [
    "A man is eating food.",
    "A man is eating pasta.",
    "The girl is carrying a baby.",
    "A man is riding a horse.",
]

# 2. Encode
query_embedding = model.encode(query, prompt_name="query")
# Equivalent Alternatives:
# query_embedding = model.encode(query_prompt + query)
# query_embedding = model.encode(query, prompt=query_prompt)

docs_embeddings = model.encode(docs)

# Optional: Quantize the embeddings
binary_query_embedding = quantize_embeddings(query_embedding, precision="ubinary")
binary_docs_embeddings = quantize_embeddings(docs_embeddings, precision="ubinary")

similarities = cos_sim(query_embedding, docs_embeddings)
print('similarities:', similarities)

🧠 Transformers

from typing import Dict

import torch
import numpy as np
from transformers import AutoModel, AutoTokenizer
from sentence_transformers.util import cos_sim

# For retrieval you need to pass this prompt. Please find our more in our blog post.
def transform_query(query: str) -> str:
    """ For retrieval, add the prompt for query (not for documents).
    """
    return f'Represent this sentence for searching relevant passages: {query}'

# The model works really well with cls pooling (default) but also with mean pooling.
def pooling(outputs: torch.Tensor, inputs: Dict,  strategy: str = 'cls') -> np.ndarray:
    if strategy == 'cls':
        outputs = outputs[:, 0]
    elif strategy == 'mean':
        outputs = torch.sum(
            outputs * inputs["attention_mask"][:, :, None], dim=1) / torch.sum(inputs["attention_mask"], dim=1, keepdim=True)
    else:
        raise NotImplementedError
    return outputs.detach().cpu().numpy()

# 1. load model
model_id = 'mixedbread-ai/bobo-embed-large-v1'
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModel.from_pretrained(model_id).cuda()


docs = [
    transform_query('A man is eating a piece of bread'),
    "A man is eating food.",
    "A man is eating pasta.",
    "The girl is carrying a baby.",
    "A man is riding a horse.",
]

# 2. encode
inputs = tokenizer(docs, padding=True, return_tensors='pt')
for k, v in inputs.items():
    inputs[k] = v.cuda()
outputs = model(**inputs).last_hidden_state
embeddings = pooling(outputs, inputs, 'cls')

similarities = cos_sim(embeddings[0], embeddings[1:])
print('similarities:', similarities)

🏁 Evaluation

As of March 2024, our model archives SOTA performance for Bert-large sized models on the MTEB. It ourperforms commercial models like OpenAIs text-embedding-3-large and matches the performance of model 20x it's size like the echo-mistral-7b. Our model was trained with no overlap of the MTEB data, which indicates that our model generalizes well across several domains, tasks and text length. We know there are some limitations with this model, which will be fixed in v2.

Model	Avg (56 datasets)	Classification (12 datasets)	Clustering (11 datasets)	PairClassification (3 datasets)	Reranking (4 datasets)	Retrieval (15 datasets)	STS (10 datasets)	Summarization (1 dataset)
bobo-embed-large-v1	64.68	75.64	46.71	87.2	60.11	54.39	85.00	32.71
bge-large-en-v1.5	64.23	75.97	46.08	87.12	60.03	54.29	83.11	31.61
bobo-embed-2d-large-v1	63.25	74.14	46.07	85.89	58.94	51.42	84.9	31.55
nomic-embed-text-v1	62.39	74.12	43.91	85.15	55.69	52.81	82.06	30.08
jina-embeddings-v2-base-en	60.38	73.45	41.73	85.38	56.98	47.87	80.7	31.6
Proprietary Models
OpenAI text-embedding-3-large	64.58	75.45	49.01	85.72	59.16	55.44	81.73	29.92
Cohere embed-english-v3.0	64.47	76.49	47.43	85.84	58.01	55.00	82.62	30.18
OpenAI text-embedding-ada-002	60.99	70.93	45.90	84.89	56.32	49.25	80.97	30.80

💻 Matryoshka and Binary Quantization

Embeddings in their commonly used form (float arrays) have a high memory footprint when used at scale. Two approaches to solve this problem are Matryoshka Representation Learning (MRL) and (Binary) Quantization. While MRL reduces the number of dimensions of an embedding, binary quantization transforms the value of each dimension from a float32 into a lower precision (int8 or even binary). The model supports both approaches!

You can also take it one step further, and combine both MRL and quantization. This combination of binary quantization and MRL allows you to reduce the memory usage of your embeddings significantly. This leads to much lower costs when using a vector database in particular.

📝License

Bobo is published under the MIT General Public License v3

Downloads last month: 15

GGUF

Model size

0.3B params

Architecture

bert

Hardware compatibility

16-bit

leeroy-jankins
/

bobo