GliZNet: Zero-Shot Multi-Label Text Classification
GliZNet (Generalized Label-Integrated Zero-shot Network) is a novel architecture for zero-shot text classification that embeds labels directly in the input sequence alongside the text.
https://github.com/KameniAlexNea/gliznet-paper
Model Details
- Base Model: answerdotai/ModernBERT-base
- Architecture: GliZNet
- Task: Zero-shot multi-label text classification
- Training Objective: Supervised Contrastive Learning + Label Repulsion + Binary Cross-Entropy
Architecture Overview
Input: [CLS] text tokens [LAB] label1 [LAB] label2 ... [SEP]
↓
Backbone Encoder
↓
┌───────────┴───────────┐
↓ ↓
Text Projector Label Projector
↓ ↓
CLS Embedding Aggregated Label Embeddings
└───────────┬───────────┘
↓
Cosine Similarity
↓
Label Scores
Key Features
- Label Integration: Labels are embedded in the input sequence using special
[LAB]tokens - Separate Projections: Independent projection layers for text (CLS) and labels
- Cosine Similarity: Normalized dot product with learnable temperature scaling
- Advanced Loss Function:
- Supervised Contrastive Loss: Encourages positive labels to cluster together
- Label Repulsion: Pushes different labels apart within the same sample
- Decoupled BCE: Auxiliary loss with independent temperature
Training Configuration
- Similarity Metric: Cosine similarity with learned temperature
- SupCon Loss Weight: 1.0
- Label Repulsion Weight: 0.1
- BCE Loss Weight: 1.0
- Repulsion Threshold: 0.3
Usage
Installation
pip install transformers torch
Basic Usage
from gliznet.model import GliZNetForSequenceClassification
from gliznet.tokenizer import GliZNETTokenizer
# Load model and tokenizer
model, tokenizer = GliZNetForSequenceClassification.from_pretrained_with_tokenizer(
"alexneakameni/gliznet-ModernBERT-base"
)
# Single prediction
text = "Scientists discover new renewable energy breakthrough"
labels = ["science", "technology", "environment", "business", "politics"]
output = model.predict_example(text, labels, tokenizer)
for label_score in output.labels:
print(f"{label_score.label}: {label_score.score:.4f}")
Batch Prediction
texts = [
"Stock markets rally on positive economic news",
"New study reveals benefits of regular exercise"
]
labels_list = [
["business", "finance", "economy"],
["health", "science", "lifestyle"]
]
outputs = model.predict_batch(texts, labels_list, tokenizer)
for output in outputs:
print(f"\nText: {output.text}")
for label_score in output.labels:
print(f" {label_score.label}: {label_score.score:.4f}")
Advanced Usage with Custom Activation
# Use softmax for mutually exclusive labels
output = model.predict_example(
text="This is a sports article",
labels=["sports", "politics", "technology"],
tokenizer=tokenizer,
activation="softmax" # or "sigmoid" for multi-label
)
Technical Details
Token Embedding Strategy
The model uses a smart resizing strategy that only increases embedding capacity when necessary:
- Base model embeddings: Pre-allocated capacity (e.g., 50283 for ModernBERT-base)
- After adding
[LAB]token: Fits within existing capacity - No unnecessary weight modifications during training
Loss Functions
Supervised Contrastive (SupCon):
- Treats all positive labels as anchors
- Encourages high similarity for relevant labels
- Pushes irrelevant labels away
Label Repulsion:
- Prevents label embedding collapse
- Only applied to different labels within the same sample
- Respects contextual nature of label embeddings
Binary Cross-Entropy:
- Auxiliary classification loss
- Uses decoupled temperature from SupCon
- Helps with calibration
Model Card
- Developed by: Alex Nea Kameni
- Model type: Zero-shot text classification
- Language: English
Citation
If you use this model, please cite:
@misc{gliznet2024,
author = {Kameni Alex},
title = {GliZNet: Generalized Label-Integrated Zero-shot Network},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/alexneakameni/gliznet-ModernBERT-base}}
}
Limitations
- Trained primarily on English text
- Performance depends on label clarity and specificity
- May require label tuning for optimal results on specific domains
Training Data
Model was trained on [describe your training data here]
Evaluation
[Add evaluation metrics and results here]
Acknowledgments
Built with 🤗 Transformers and PyTorch.
- Downloads last month
- 49
Model tree for alexneakameni/gliznet-ModernBERT-base
Base model
answerdotai/ModernBERT-base