GliZNet: Zero-Shot Multi-Label Text Classification

GliZNet (Generalized Label-Integrated Zero-shot Network) is a novel architecture for zero-shot text classification that embeds labels directly in the input sequence alongside the text.

https://github.com/KameniAlexNea/gliznet-paper

Model Details

Base Model: answerdotai/ModernBERT-base
Architecture: GliZNet
Task: Zero-shot multi-label text classification
Training Objective: Supervised Contrastive Learning + Label Repulsion + Binary Cross-Entropy

Architecture Overview

Input: [CLS] text tokens [LAB] label1 [LAB] label2 ... [SEP]
                ↓
        Backbone Encoder
                ↓
    ┌───────────┴───────────┐
    ↓                       ↓
Text Projector      Label Projector
    ↓                       ↓
CLS Embedding      Aggregated Label Embeddings
    └───────────┬───────────┘
                ↓
        Cosine Similarity
                ↓
          Label Scores

Key Features

Label Integration: Labels are embedded in the input sequence using special [LAB] tokens
Separate Projections: Independent projection layers for text (CLS) and labels
Cosine Similarity: Normalized dot product with learnable temperature scaling
Advanced Loss Function:
- Supervised Contrastive Loss: Encourages positive labels to cluster together
- Label Repulsion: Pushes different labels apart within the same sample
- Decoupled BCE: Auxiliary loss with independent temperature

Training Configuration

Similarity Metric: Cosine similarity with learned temperature
SupCon Loss Weight: 1.0
Label Repulsion Weight: 0.1
BCE Loss Weight: 1.0
Repulsion Threshold: 0.3

Usage

Installation

pip install transformers torch

Basic Usage

from gliznet.model import GliZNetForSequenceClassification
from gliznet.tokenizer import GliZNETTokenizer

# Load model and tokenizer
model, tokenizer = GliZNetForSequenceClassification.from_pretrained_with_tokenizer(
    "alexneakameni/gliznet-ModernBERT-base"
)

# Single prediction
text = "Scientists discover new renewable energy breakthrough"
labels = ["science", "technology", "environment", "business", "politics"]

output = model.predict_example(text, labels, tokenizer)

for label_score in output.labels:
    print(f"{label_score.label}: {label_score.score:.4f}")

Batch Prediction

texts = [
    "Stock markets rally on positive economic news",
    "New study reveals benefits of regular exercise"
]

labels_list = [
    ["business", "finance", "economy"],
    ["health", "science", "lifestyle"]
]

outputs = model.predict_batch(texts, labels_list, tokenizer)

for output in outputs:
    print(f"\nText: {output.text}")
    for label_score in output.labels:
        print(f"  {label_score.label}: {label_score.score:.4f}")

Advanced Usage with Custom Activation

# Use softmax for mutually exclusive labels
output = model.predict_example(
    text="This is a sports article",
    labels=["sports", "politics", "technology"],
    tokenizer=tokenizer,
    activation="softmax"  # or "sigmoid" for multi-label
)

Technical Details

Token Embedding Strategy

The model uses a smart resizing strategy that only increases embedding capacity when necessary:

Base model embeddings: Pre-allocated capacity (e.g., 50283 for ModernBERT-base)
After adding [LAB] token: Fits within existing capacity
No unnecessary weight modifications during training

Loss Functions

Supervised Contrastive (SupCon):
- Treats all positive labels as anchors
- Encourages high similarity for relevant labels
- Pushes irrelevant labels away
Label Repulsion:
- Prevents label embedding collapse
- Only applied to different labels within the same sample
- Respects contextual nature of label embeddings
Binary Cross-Entropy:
- Auxiliary classification loss
- Uses decoupled temperature from SupCon
- Helps with calibration

Model Card

Developed by: Alex Nea Kameni
Model type: Zero-shot text classification
Language: English

Citation

If you use this model, please cite:

@misc{gliznet2024,
  author = {Kameni Alex},
  title = {GliZNet: Generalized Label-Integrated Zero-shot Network},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/alexneakameni/gliznet-ModernBERT-base}}
}

Limitations

Trained primarily on English text
Performance depends on label clarity and specificity
May require label tuning for optimal results on specific domains

Training Data

Model was trained on [describe your training data here]

Evaluation

[Add evaluation metrics and results here]

Acknowledgments

Built with 🤗 Transformers and PyTorch.

Downloads last month: 49

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for alexneakameni/gliznet-ModernBERT-base

Base model

answerdotai/ModernBERT-base

Finetuned

(1007)

this model

alexneakameni
/

gliznet-ModernBERT-base