--- license: apache-2.0 pipeline_tag: token-classification library_name: gliner tags: - named-entity-recognition - relation-extraction - zero-shot - gliner - information-extraction --- # 🔗 GLiNER-relex: Generalist and Lightweight Model for Joint Zero-Shot NER and Relation Extraction GLiNER-relex is a unified model for **zero-shot Named Entity Recognition (NER)** and **Relation Extraction (RE)** that performs both tasks simultaneously in a single forward pass. Built on the GLiNER architecture, it extends the span-based approach to jointly identify entities and extract relationships between them. ## ✨ Key Features - **Joint Extraction**: Simultaneously extracts entities and relations in one forward pass - **Zero-Shot**: No fine-tuning required - specify entity types and relation types at inference time - **Efficient**: Single encoder architecture processes both tasks together - **Flexible**: Supports custom entity and relation schemas per inference call - **Production-Ready**: ONNX export support for deployment ## 📦 Installation First, install the GLiNER library: ```bash pip install gliner -U ``` ## 🚀 Quick Start ### Basic Usage ```python from gliner import GLiNER # Load the model model = GLiNER.from_pretrained("knowledgator/gliner-relex-large-v0.5") # Define your entity types and relation types entity_labels = ["location", "person", "date", "structure"] relation_labels = ["located in", "designed by", "completed in"] # Input text text = "The Eiffel Tower, located in Paris, France, was designed by engineer Gustave Eiffel and completed in 1889." # Run inference - returns both entities and relations entities, relations = model.inference( texts=[text], labels=entity_labels, relations=relation_labels, threshold=0.5, adjacency_threshold=0.55, relation_threshold=0.8, return_relations=True, flat_ner=False ) # Print entities print("Entities:") for entity in entities[0]: print(f" {entity['text']} -> {entity['label']} (score: {entity['score']:.3f})") # Print relations print("\nRelations:") for relation in relations[0]: head = relation['head']['text'] tail = relation['tail']['text'] rel_type = relation['relation'] score = relation['score'] print(f" {head} --[{rel_type}]--> {tail} (score: {score:.3f})") ``` **Expected Output:** ``` Entities: Eiffel Tower -> structure (score: 0.912) Paris -> location (score: 0.934) France -> location (score: 0.891) Gustave Eiffel -> person (score: 0.923) 1889 -> date (score: 0.856) Relations: Eiffel Tower --[located in]--> Paris (score: 0.823) Eiffel Tower --[designed by]--> Gustave Eiffel (score: 0.847) Eiffel Tower --[completed in]--> 1889 (score: 0.789) ``` ### Batch Processing ```python texts = [ "Elon Musk founded SpaceX in Hawthorne, California.", "Microsoft, led by Satya Nadella, acquired GitHub in 2018.", "The Louvre Museum in Paris houses the Mona Lisa." ] entity_labels = ["person", "organization", "location", "artwork"] relation_labels = ["founder of", "CEO of", "located in", "acquired", "houses"] entities, relations = model.inference( texts=texts, labels=entity_labels, relations=relation_labels, threshold=0.5, relation_threshold=0.5, batch_size=8, return_relations=True, flat_ner=False ) for i, (text_entities, text_relations) in enumerate(zip(entities, relations)): print(f"\nText {i + 1}:") print(f" Entities: {[e['text'] for e in text_entities]}") print(f" Relations: {[(r['head']['text'], r['relation'], r['tail']['text']) for r in text_relations]}") ``` ### Entity-Only Extraction If you only need entities without relations: ```python entities = model.inference( texts=[text], labels=entity_labels, relations=[], # Empty list for relations threshold=0.5, return_relations=False, # Skip relation extraction flat_ner=False ) ``` ## ⚙️ Advanced Configuration ### Adjusting Thresholds You can fine-tune extraction sensitivity with separate thresholds: ```python entities, relations = model.inference( texts=texts, labels=entity_labels, relations=relation_labels, threshold=0.5, # Entity confidence threshold adjacency_threshold=0.6, # Threshold for entity pair candidates relation_threshold=0.7, # Relation classification threshold flat_ner=True, # Enforce non-overlapping entities multi_label=False, # Single label per entity span return_relations=True, flat_ner=False ) ``` We recommend lowering the `threshold` (entity extraction threshold) and keeping it in the range of 0.3–0.5. For `adjacency_threshold`, the model provides good results in the 0.5–0.65 range. For `relation_threshold`, use larger values like 0.7–0.9. Feel free to adjust all of these values based on your project requirements. ## 📊 Output Format ### Entity Format ```python { "start": int, # Start character position "end": int, # End character position "text": str, # Entity text span "label": str, # Entity type "score": float # Confidence score (0-1) } ``` ### Relation Format ```python { "head": { "start": int, "end": int, "text": str, "type": str, "entity_idx": int # Index in entities list }, "tail": { "start": int, "end": int, "text": str, "type": str, "entity_idx": int }, "relation": str, # Relation type "score": float # Confidence score (0-1) } ``` ## 🏗️ Architecture GLiNER-relex uses a unified encoder architecture that: 1. **Encodes text and labels jointly** using a transformer backbone. 2. **Identifies entity spans** using span-based classification. 3. **Constructs an adjacency matrix** to identify potential entity pairs using graph convolutional networks. 4. **Classifies relations** between selected entity pairs. This joint approach allows the model to leverage entity information when extracting relations, leading to more coherent predictions. ## 📚 Use Cases - **Knowledge Graph Construction**: Extract structured facts from unstructured text - **Information Extraction Pipelines**: Build end-to-end IE systems - **Document Understanding**: Extract entities and their relationships from documents - **Question Answering**: Power QA systems with structured knowledge - **Data Enrichment**: Automatically annotate text corpora