GliZNet: Zero-Shot Multi-Label Text Classification

GliZNet (Generalized Label-Integrated Zero-shot Network) is a novel architecture for zero-shot text classification that embeds labels directly in the input sequence alongside the text.

https://github.com/KameniAlexNea/gliznet-paper

Model Details

  • Base Model: answerdotai/ModernBERT-base
  • Architecture: GliZNet
  • Task: Zero-shot multi-label text classification
  • Training Objective: Supervised Contrastive Learning + Label Repulsion + Binary Cross-Entropy

Architecture Overview

Input: [CLS] text tokens [LAB] label1 [LAB] label2 ... [SEP]
                ↓
        Backbone Encoder
                ↓
    ┌───────────┴───────────┐
    ↓                       ↓
Text Projector      Label Projector
    ↓                       ↓
CLS Embedding      Aggregated Label Embeddings
    └───────────┬───────────┘
                ↓
        Cosine Similarity
                ↓
          Label Scores

Key Features

  1. Label Integration: Labels are embedded in the input sequence using special [LAB] tokens
  2. Separate Projections: Independent projection layers for text (CLS) and labels
  3. Cosine Similarity: Normalized dot product with learnable temperature scaling
  4. Advanced Loss Function:
    • Supervised Contrastive Loss: Encourages positive labels to cluster together
    • Label Repulsion: Pushes different labels apart within the same sample
    • Decoupled BCE: Auxiliary loss with independent temperature

Training Configuration

  • Similarity Metric: Cosine similarity with learned temperature
  • SupCon Loss Weight: 1.0
  • Label Repulsion Weight: 0.1
  • BCE Loss Weight: 1.0
  • Repulsion Threshold: 0.3

Usage

Installation

pip install transformers torch

Basic Usage

from gliznet.model import GliZNetForSequenceClassification
from gliznet.tokenizer import GliZNETTokenizer

# Load model and tokenizer
model, tokenizer = GliZNetForSequenceClassification.from_pretrained_with_tokenizer(
    "alexneakameni/gliznet-ModernBERT-base"
)

# Single prediction
text = "Scientists discover new renewable energy breakthrough"
labels = ["science", "technology", "environment", "business", "politics"]

output = model.predict_example(text, labels, tokenizer)

for label_score in output.labels:
    print(f"{label_score.label}: {label_score.score:.4f}")

Batch Prediction

texts = [
    "Stock markets rally on positive economic news",
    "New study reveals benefits of regular exercise"
]

labels_list = [
    ["business", "finance", "economy"],
    ["health", "science", "lifestyle"]
]

outputs = model.predict_batch(texts, labels_list, tokenizer)

for output in outputs:
    print(f"\nText: {output.text}")
    for label_score in output.labels:
        print(f"  {label_score.label}: {label_score.score:.4f}")

Advanced Usage with Custom Activation

# Use softmax for mutually exclusive labels
output = model.predict_example(
    text="This is a sports article",
    labels=["sports", "politics", "technology"],
    tokenizer=tokenizer,
    activation="softmax"  # or "sigmoid" for multi-label
)

Technical Details

Token Embedding Strategy

The model uses a smart resizing strategy that only increases embedding capacity when necessary:

  • Base model embeddings: Pre-allocated capacity (e.g., 50283 for ModernBERT-base)
  • After adding [LAB] token: Fits within existing capacity
  • No unnecessary weight modifications during training

Loss Functions

  1. Supervised Contrastive (SupCon):

    • Treats all positive labels as anchors
    • Encourages high similarity for relevant labels
    • Pushes irrelevant labels away
  2. Label Repulsion:

    • Prevents label embedding collapse
    • Only applied to different labels within the same sample
    • Respects contextual nature of label embeddings
  3. Binary Cross-Entropy:

    • Auxiliary classification loss
    • Uses decoupled temperature from SupCon
    • Helps with calibration

Model Card

  • Developed by: Alex Nea Kameni
  • Model type: Zero-shot text classification
  • Language: English

Citation

If you use this model, please cite:

@misc{gliznet2024,
  author = {Kameni Alex},
  title = {GliZNet: Generalized Label-Integrated Zero-shot Network},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/alexneakameni/gliznet-ModernBERT-base}}
}

Limitations

  • Trained primarily on English text
  • Performance depends on label clarity and specificity
  • May require label tuning for optimal results on specific domains

Training Data

Model was trained on [describe your training data here]

Evaluation

[Add evaluation metrics and results here]

Acknowledgments

Built with 🤗 Transformers and PyTorch.

Downloads last month
49
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for alexneakameni/gliznet-ModernBERT-base

Finetuned
(1007)
this model

Dataset used to train alexneakameni/gliznet-ModernBERT-base