ALIA Legal-Administrative 7B Instruct
Model Description
ALIA Legal-Administrative 7B Instruct is an instruction-tuned language model specialized in the legal and administrative domain for Spanish. This model is derived from SINAI/ALIA-legal-administrative-7B-Base, which itself is based on the Salamandra-7B model family.
The model has been instruction-tuned using the ALIA-legal-administrative-synthetic-instructions dataset, enabling it to assist users with legal and administrative queries in Spanish.
- Developed by: SINAI Research Group – Universidad de Jaén (CEATIC)
- Funded by: Ministerio para la Transformación Digital y de la Función Pública — EU NextGenerationEU, within the project Desarrollo de Modelos ALIA
- Language (NLP): Spanish
- License: CC BY-SA 4.0
Model Lineage
Salamandra-7B (BSC-LT)
↓
ALIA-legal-administrative-7B-Base (SINAI)
↓
ALIA-legal-administrative-7B-Instruct (SINAI) ← This model
Key Features
- Specialized Domain: Legal and administrative Spanish language
- Instruction-Following: Fine-tuned to respond to user queries and instructions
- Foundation: Built upon Salamandra-7B's multilingual capabilities, focused on Spanish
- Open License: Released under Apache 2.0 license
Model Details
Architecture
This model maintains the same architecture as its base model ALIA-legal-administrative-7B-Base, which is derived from Salamandra-7B:
| Total Parameters | 7,768,117,248 |
| Embedding Parameters | 1,048,576,000 |
| Layers | 32 |
| Hidden size | 4,096 |
| Attention heads | 32 |
| Context length | 8,192 |
| Vocabulary size | 256,000 |
| Precision | bfloat16 |
| Embedding type | RoPE |
| Activation Function | SwiGLU |
| Layer normalization | RMS Norm |
| Flash attention | ✅ |
| Grouped Query Attention | ✅ |
| Num. query groups | 8 |
Training Details
Instruction Tuning:
- Training was conducted by Barcelona Supercomputing Center (BSC)
- Dataset: ALIA-legal-administrative-synthetic-instructions
- Language: Spanish
- Domain: Legal and administrative
- Number of samples: 7,411,809
Training Infrastructure:
Model trained on MareNostrum 5, a pre-exascale EuroHPC supercomputer hosted and operated by Barcelona Supercomputing Center.
The accelerated partition is composed of 1,120 nodes with the following specifications:
- 4x Nvidia Hopper GPUs with 64GB HBM2 memory
- 2x Intel Sapphire Rapids 8460Y+ at 2.3GHz and 32c each (64 cores)
- 4x NDR200 (BW per node 800Gb/s)
- 512 GB of Main memory (DDR5)
- 460GB of NVMe storage
The table below specifies the number of nodes and GPUs employed for the supervised fine-tuning:
| Phase | Nodes | GPUs | Training Time |
|---|---|---|---|
| SFT | 16 | 64 | 17h 1m 42s |
Training Hyperparameters:
Supervised Fine-Tuning was conducted using an internal fork of the FastChat codebase, adapted to BSC's infrastructure and optimized for stability and efficiency.
| Hyperparameter | Value |
|---|---|
| Learning rate | 1e-5 |
| Batch size | 1024 |
| Epochs | 1 |
| LR Scheduler | Cosine |
| Warmup Ratio | 0.03 |
| NEFTune Noise Alpha | 5 |
Intended Use
Direct Use
This model is designed to assist users with questions and tasks related to legal and administrative matters in Spanish. It can be used for:
- Answering legal and administrative queries
- Providing information about legal procedures
- Assisting with understanding administrative documentation
- General legal consultation and guidance
How to Use
Inference
Basic Usage with Transformers
pip install transformers torch accelerate sentencepiece protobuf
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "SINAI/ALIA-legal-administrative-7B-Instruct"
# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
torch_dtype=torch.bfloat16
)
# Example legal query
messages = [
{"role": "user", "content": "¿Cuál es el plazo para presentar un recurso de alzada?"}
]
# Format the input
input_ids = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
# Generate response
outputs = model.generate(
input_ids,
max_new_tokens=512,
temperature=0.7,
top_p=0.95,
do_sample=True
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Inference with vLLM
pip install vllm
from vllm import LLM, SamplingParams
model_id = "SINAI/ALIA-legal-administrative-7B-Instruct"
# Create sampling parameters
sampling_params = SamplingParams(
temperature=0.7,
top_p=0.95,
max_tokens=512
)
# Initialize the model
llm = LLM(model=model_id)
# Example prompt
prompt = "¿Cuáles son los requisitos para solicitar una licencia de actividad?"
# Generate response
outputs = llm.generate([prompt], sampling_params)
for output in outputs:
print(f"Generated text: {output.outputs[0].text}")
Training Data
The model was instruction-tuned using the ALIA-legal-administrative-synthetic-instructions dataset created by SINAI research group.
This dataset contains synthetic instructions specifically designed for the legal and administrative domain in Spanish, enabling the model to understand and respond to domain-specific queries.
Evaluation
The model was evaluated using the Galtea evaluation platform, which employs LLM-as-judge methodology to assess model performance across various dimensions of quality and accuracy.
Evaluation Methodology
Framework: Galtea Platform
LLM Judges: The evaluation uses multiple state-of-the-art language models as judges to assess response quality:
- Gemini 2.5 Flash
- GPT-5.2
- Claude Sonnet 4.5
Metrics: The model was evaluated using Galtea's custom metrics, specifically:
- Answer Relevancy: Measures how relevant and appropriate the model's responses are to the input questions
- Factual Accuracy: Assesses the correctness and truthfulness of the information provided in responses
Evaluation Datasets
Justicio RAG QA Dataset
- Source: dariolopez/justicio-rag-embedding-qa
- Description: A Spanish legal domain question-answering dataset containing 815 instances designed for RAG (Retrieval-Augmented Generation) evaluation
- Domain: Legal and administrative Spanish content
Oposiciones Dataset
- Description: A collection of questions from Spanish public examination tests (oposiciones)
- Domain: Administrative and legal knowledge assessment for public sector positions
Performance Metrics
The following table shows evaluation results comparing ALIA Legal-Administrative 7B Instruct against the baseline Salamandra-7B-instruct model:
| Metric | ALIA Legal-Administrative 7B Instruct | Salamandra-7B-instruct |
|---|---|---|
| Answer Relevancy | 93.5 | 90.6 |
| Factual Accuracy | 29.3 | 19.3 |
| Justicio LLM Judging | 0.610273 | 0.514706 |
| Oposiciones LLM Judging | 0.340541 | 0.268284 |
Note: The model actually outperforms Salamandra-7B-instruct in all metrics, reflecting the improved precision and correctness concerning legal domain questions.
Overall Performance Summary
- Average Input Tokens: 114
- Average Output Tokens: 85.9
- Overall Average Score: 51%
Limitations and Biases
Known Limitations
- Domain Specificity: While specialized in legal and administrative Spanish, the model may not perform optimally on general-purpose tasks
- Language: Optimized for Spanish only
- Not Legal Advice: Outputs should not be considered as professional legal advice
- Training Data Constraints: Performance is limited by the scope and quality of the training data
- Potential Hallucinations: Like all language models, may generate plausible-sounding but incorrect information
Bias Considerations
- The model inherits potential biases from its base model (Salamandra-7B) and training data
- Legal and administrative language may reflect existing biases in legal systems and documentation
- Users should be aware of potential biases when using the model for sensitive applications
- We recommend additional bias testing and mitigation for specific use cases
Safety and Responsible Use
- Human Oversight: Always verify model outputs, especially for critical legal matters
- Professional Consultation: Consult with qualified legal professionals for important decisions
- Compliance: Ensure use complies with applicable laws and regulations regarding AI systems
- Privacy: Do not input sensitive personal or confidential information
Additional Information
Authors
SINAI Research Group, Universidad de Jaén (Spain)
Training Infrastructure: Barcelona Supercomputing Center (BSC)
Acknowledgments
- Base Model: Built upon Salamandra-7B by Barcelona Supercomputing Center
- Intermediate Model: Derived from ALIA-legal-administrative-7B-Base
- Training Infrastructure: Barcelona Supercomputing Center (BSC) for conducting the instruction tuning
Citation
@misc{alia-legal-administrative-7b-instruct,
title={ALIA Legal-Administrative 7B Instruct},
author={SINAI Research Group, Universidad de Jaén},
year={2025},
publisher={HuggingFace},
howpublished={\url{https://huggingface.co/SINAI/ALIA-legal-administrative-7B-Instruct}}
}
Please also cite the base models:
@misc{gonzalezagirre2025salamandratechnicalreport,
title={Salamandra Technical Report},
author={Aitor Gonzalez-Agirre and Marc Pàmies and Joan Llop and Irene Baucells and Severino Da Dalt and Daniel Tamayo and José Javier Saiz and Ferran Espuña and Jaume Prats and Javier Aula-Blasco and Mario Mina and Adrián Rubio and Alexander Shvets and Anna Sallés and Iñaki Lacunza and Iñigo Pikabea and Jorge Palomar and Júlia Falcão and Lucía Tormo and Luis Vasquez-Reina and Montserrat Marimon and Valle Ruíz-Fernández and Marta Villegas},
year={2025},
eprint={2502.08489},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2502.08489},
}
License
Disclaimer
This model is provided for research and educational purposes. It should not be used as a substitute for professional legal advice. Users are responsible for ensuring their use of the model complies with applicable laws and regulations.
The SINAI Research Group and Barcelona Supercomputing Center shall not be held liable for any outcomes resulting from the use of this model.
Model Family
| Model | Type | Link |
|---|---|---|
| ALIA Legal-Administrative 7B | Base | SINAI/ALIA-legal-administrative-7B-Base |
| ALIA Legal-Administrative 7B | Instruct | SINAI/ALIA-legal-administrative-7B-Instruct |
- Downloads last month
- 31