English
rag

LexVA Overview

LexVA is a Retrieval-Augmented Generation (RAG) pipeline designed to answer questions grounded in the Code of Virginia. This project addresses a core challenge in legal AI: modern LLMs often hallucinate legal requirements, misinterpret statutes, or fail to retrieve the specific text needed for accurate reasoning.

By combining Qwen3-14B for generation with statute-level embeddings generated using the Qwen3 Embedding Model 0.6B, LexVA provides statute-grounded legal responses with improved citation reliability.

Uses

LexVA is intended for:

  • Statute-grounded Virginia legal question answering
  • Directing users to relevant Virginia statute sections to speed up their research
  • Research on legal retrieval behavior
  • Citation-aware LLM evaluation
  • Benchmarking RAG effectiveness on statutes

It is not intended for production legal advice or case outcome prediction.

Datasets

LexVA relies on two public datasets:

1. Virginia Code Sections (retrieval corpus)

Source statutes were collected via the official LIS API (October 2025).

Dataset link: https://huggingface.co/datasets/dcrodriguez/Code-Of-Virginia

2. Virginia Statute Synthetic QA (evaluation set)

This dataset was synthetically generated by prompting Qwen3-14B to create real-world legal questions and grounded answers. Each QA example is tied to a single statute. The dataset is used for pipeline benchmarking.

3,000 random statutes were sampled → statutes graded by Mistral-7B → 277 retained and used for QA pair generation.

  • Mistral-7B Instruct acted as LLM-as-a-judge to score statute relevance (1–5).
    • A very high percent of statute sections were highly technical, had limited or no applicability for normal people, were repealed, etc.
  • Only highest rated statutes were used to generate QA pairs (score >= 4).
  • Qwen3-14B generated a natural-language Q&A pair based on each rated statute.

Dataset link:
https://huggingface.co/datasets/dcrodriguez/Virginia-Statute-QA

No train/validation/test split was required for the RAG pipeline, but the 277-question QA set was used as held-out evaluation cases.

Pipeline Architecture and Objective

LexVA uses a retrieval-augmented generation (RAG) architecture designed for Virginia-specific legal question answering.

Core Components

  • Embedding model: Qwen 3 Embedding 0.6B
  • Vector store: SQLite
  • RAG pipeline:
    • User question → embeddings → top-k relevant sections
    • LLM answer generated based on retrieved legal excerpts
    • Inline citations to sources

Objective

To create a consistent, explainable, citation-grounded legal information system supporting Virginians navigating legal processes.

Methodology

LexVA follows a pure RAG approach without model finetuning:

Generation Model: Qwen3-14B

Chosen for strong reasoning, thinking, and citation capability, as well as multilingual and long-context strengths.

Retriever: Section-Level

  • Each statute section was embedded using the Qwen3 Embedding model.
  • No internal chunking — the complete statute text was the single retrieval unit.
  • Vectors are stored in SQLite DB.

RAG Flow

  1. User question → embed using Qwen3 Embedding 0.6B
  2. Perform top-k retrieval (k=5) on statute corpus
  3. Pass retrieved statutes + question to Qwen3-14B
  4. Generate grounded answer including statutory citation

No finetuning was performed; the evaluation tests the effectiveness of retrieval + prompting.

Please note that prompts and answers are cached in the SQLite DB to speed up benchmarking tasks.

Evaluation

Model / Pipeline Recall@3 Recall@5 Faithfulness (DeepEval) GEval (DeepEval) Cosine Similarity
Qwen3 Embedding 0.6B 84.32% 89.73% NA NA NA
Qwen3-14B (no RAG) NA NA 0.923 0.257 0.716
Qwen3-14B (with RAG) NA NA 0.950 0.705 0.719
Mistral 7B (no RAG) NA NA 0.855 0.224 0.717
Mistral 7B (with RAG) NA NA 0.946 0.664 0.790

For information retrieval, Qwen3 Embedding 0.6B performs extremely well. Part of this may be due to using higher quality synthetic QA pairs generated from a curated set of graded statutes.

Cosine similarity measures how semantically similar two answers are by generating two embedding vectors. While it's quick to calculate, it ended up being a poor comparison metric for legal Q&A. You can see all the similarity values are within 8 percent. Answers generated without RAG were semantically similar, but contained many hallucinations and false statements.

The DeepEval benchmarks both use an LLM as a judge. I used GPT-5 Mini for these benchmarks as the judge model. Smaller local models had trouble with reasoning, and scores seemed inflated.

Faithfulness measures how correct the pipeline output is when given the relevant statute. The input is an actual output, and expected output, and retrieved context documents. It is a more standardized metric compared to GEval, but I think it struggled with RAG and non-RAG answers being similar despite citations being made up for non-RAG answers. Medium size models like Qwen 14B do a surprisingly good job answering these questions without RAG, but once again they make up most of their references, which makes it much less useful as a legal research tool. I think this is why Faithfulness metrics are as close as they are. Qwen3 14B scored higher than Mistral 7B which is expected.

GEval was by far the most useful metric. The input is an actual output and an expected output. GEval supports custom grading criteria which was necessary to properly benchmark this pipeline. The evaluation steps are below.

evaluation_steps=[
    "Check whether the facts in 'actual output' contradict any facts in 'expected output'.",
    "Heavily penalize omission of important details from the expected output.",
    "If a specific statute is referenced in 'expected output' but not 'actual output' lower the score, but numerically close references are ok. For example, 55.1-1243.2 is close to 55.1-1244.1, but is far from 19.2-308.",
    "Additional statute references in actual output are ok",
    "Minor stylistic differences or changes in wording are acceptable.",
],

This benchmark was significantly more successful at identifying hallucinated citations. It evaluates both the claims made in the answer, as well as examines if the expected statute citations were used. You can see the largest delta between RAG and non-RAG answers using this benchmark.

How to Use

  1. Clone this model repository using git git clone https://huggingface.co/dcrodriguez/virginia-legal-rag-lexva
  2. Optionally use a python 3.12 virtual environment
  3. Install the pip packages in install.sh
  4. Run python setup.py. This will download the dataset, load it into a SQLite DB, and generate all the document embeddings.
  5. Run python example.py

Notes:

  • The chat_engine tracks message history. If you want to start fresh, call the .clear_history() method.
  • Outputs are in markdown format
  • If you're running in an ephemeral environment like Google Colab, you should put your sqlite DB in your Google Drive. Change the relative path in setup.py to point to your drive, and then instantiate the chat engine like this:
sqlite_path = "/content/drive/MyDrive/LexVA/va_code.db"
chat_engine = LLMChatEngine(sqlite_path)

response = chat_engine.chat("<Question>", rag=True)
print(response)

Example Generation

Question:

"Is it illegal to offer someone money to vote a certain way in an election?"

Answer with RAG:

In Virginia, it is illegal to offer someone money or anything of value to influence how they vote in an election. This is explicitly prohibited by Virginia Code § 24.2-1007, which states:

"No person shall solicit or accept directly or indirectly any money or any thing of value to influence his or another's vote in any election."

Violating this law is a Class 1 misdemeanor, which carries potential penalties such as fines or imprisonment. The statute applies to all elections, including those for political party nominations or delegate selection.

This law is designed to prevent bribery or corruption in the electoral process. Offering money (or anything of value) to sway a vote is a criminal offense under Virginia law.

Note: This is general information and does not constitute legal advice. For specific cases, consult a qualified attorney.


Bias, Risks, and Limitations

LexVA is a retrieval-augmented legal question-answering system designed to help users better understand Virginia statutes, court forms, procedures, and publicly available legal information. It is not a lawyer, does not provide legal advice, and cannot replace professional legal counsel.

Risks

  • Legal misinformation: The model may misinterpret statutes, misapply exceptions, or provide outdated references if retrieval fails or sources are ambiguous.
  • Oversimplification: Legal rules in Virginia often include jurisdiction-specific exceptions, time limits, and conditions. The model may output simplified or incomplete explanations.
  • Hallucination risks: Although RAG minimizes hallucinations, the underlying LLM may still produce fabricated statutes, cases, or procedural steps without explicit citations.
  • Equity and access concerns: While intended for access-to-justice use, the model may reflect biases present in legal language, historical case law, and datasets—potentially reinforcing systemic inequalities (e.g., disparate treatment across race, income, or geography).
  • Non-generalizability: LexVA is built for Virginia law only. Its outputs may be incorrect or misleading for other states.

Recommendations

Users:

  • Treat outputs as informational summaries, not authoritative legal advice.
  • Always verify answers by reading the referenced statute, form, or retrieved passage.
  • Consult a licensed Virginia attorney for case-specific guidance.

Compute Infrastructure

At least 16GB of VRAM is recommended

  • Developed and tested using:
    • Google Colab GPUs (L4/A100)
    • Local development with RTX 5060 Ti 16GB

Citation

If you use LexVA in research or development, please cite:

BibTeX:

@misc{cuevasrodriguez2025lexva,
  title        = {LexVA: Retrieval-Augmented Legal Question Answering for Virginia},
  author       = {Cuevas Rodriguez, Dalila},
  year         = {2025},
  howpublished = {Hugging Face Hub},
  note         = {An access-to-justice RAG system grounded in Virginia legal texts}
}

APA:

Cuevas Rodriguez, D. (2025). LexVA: Retrieval-augmented legal question answering for Virginia. Hugging Face. https://huggingface.co/virginia-legal-rag-lexva

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dcrodriguez/virginia-legal-rag-lexva

Finetuned
Qwen/Qwen3-14B
Finetuned
(152)
this model

Datasets used to train dcrodriguez/virginia-legal-rag-lexva