PLDR-LLM-v52-81M-FT-QA-1

Model Description

PLDR-LLM-v52-81M-FT-QA-1 is a finetuned PLDR-LLM (Large Language Model from Power Law Decoder Representations) with KV-cache and G-cache support for question answering. This model has a parameter size of 81M. It was finetuned using the SQuAD dataset on the PLDR-LLM base model PLDR-LLM-v52-110M-1.

More details about the PLDR-LLM architecture can be found in the research paper titled PLDR-LLMs Learn A Generalizable Tensor Operator That Can Replace Its Own Deep Neural Net At Inference.

Training data

PLDR-LLM-v52-81M-FT-QA-1 was finetuned using the SQuAD dataset which is a reading comprehension dataset comprising of 87.6k samples for training and another 10.6k samples for testing. Base model was pretrained on the ~8B tokens from RefinedWeb, a publicly available English web dataset with extensive filtering and deduplication.

Training procedure

This model was trained with the custom model implementation of PLDR-LLM for the Huggingface Transformers library. [CLS] question [SEP] context [SEP] format is used for data preparation as input to the model. The Following parameters were used for finetuning and other parameters were kept same as in the research paper detailing the PLDR-LLM architecture.

Parameter Value
Learning rate 1.5x10-4
Warm-up steps 100
Grad clip by norm 1.0
Epochs 2
Padding side "right"
Add EOS token False
min_lr_rate 0.01

Intended Use and Limitations

This model is intended to be used for research purposes. Given a question and context as input prompt, it returns the predicted answer from the context. The context length for this model is 1024 tokens.

How to Use

Via Huggingface Transformers Library

PLDR-LLM has custom model support for Huggingface Transformers library. PLDR-LLM custom models support was evaluated on Transformers 4.56.1 release available at the time.

from transformers import pipeline

question_answerer = pipeline(
    task="question-answering",
    model="fromthesky/PLDR-LLM-v52-81M-FT-QA-1",
    align_to_words=False,
    device="cuda", # or "cpu" 
    trust_remote_code=True
    )

context="""
The Hittites were an Anatolian Indo-European people who formed \
one of the first major civilizations of the Bronze Age in West Asia. \
Possibly originating from beyond the Black Sea, they settled in modern-day \
Turkey in the early 2nd millennium BC. The Hittites formed a series of \
polities in north-central Anatolia, including the kingdom of Kussara \
(before 1750 BC), the Kanesh or Nesha Kingdom (c. 1750–1650 BC), \
and an empire centered on their capital, Hattusa (around 1650 BC). \
Known in modern times as the Hittite Empire, it reached its peak \
during the mid-14th century BC under Šuppiluliuma I, when it \
encompassed most of Anatolia and parts of the northern Levant \
and Upper Mesopotamia, bordering the rival empires of \
the Hurri-Mitanni and Assyrians. 
"""

question1="When did the Hittite Empire reach its peak?"
question2="Under which ruler did the Hittite Empire reach its peak?"
question3="Where did the Hittites settle?"
question4="What was the capital of the Hittite Empire?"
question5="When was Hattusa established as the capital?"
question6="Where did the Hittites come from?"
question7="When did the Hittites settle in northern part of central Anatolia?"

questions=[question1, question2, question3, question4, 
           question5, question6, question7]


answers=question_answerer(question=questions, context=context)

for q, a in zip(questions, answers):
    print(f"Question: {q}\nAnswer: {a['answer'].strip()} "
          f"(score: {a['score']}, start: {a['start']}, end: {a['end']})\n")
Question: When did the Hittite Empire reach its peak?
Answer: mid-14th century BC (score: 0.7698925137519836, start: 555, end: 575)

Question: Under which ruler did the Hittite Empire reach its peak?
Answer: Šuppiluliuma I (score: 0.9744353294372559, start: 582, end: 596)

Question: Where did the Hittites settle?
Answer: modern-day Turkey (score: 0.9418576955795288, start: 196, end: 214)

Question: What was the capital of the Hittite Empire?
Answer: Hattusa (score: 0.9977174401283264, start: 453, end: 461)

Question: When was Hattusa established as the capital?
Answer: around 1650 BC (score: 0.9929400682449341, start: 463, end: 477)

Question: Where did the Hittites come from?
Answer: beyond the Black Sea (score: 0.8200137615203857, start: 158, end: 179)

Question: When did the Hittites settle in northern part of central Anatolia?
Answer: early 2nd millennium BC (score: 0.31195276975631714, start: 221, end: 245)

Notes:

  • This implementation of PLDR-LLM custom code was evaluated on Transformers 4.56.1 and pytorch 2.6.0.
  • context string in above example is from wikipedia.

Limitations and Biases

This model was finetuned on a pretrained Large Language Model. Large Language Models may generate text that is profane, lewd, socially unacceptable or offensive based on the contents of the dataset it was pretrained. RefinedWeb is a dataset that is as toxic and biased as the Pile. Please see the papers for RefinedWeb and the Pile for more information. Moreover, large language models are also susceptible to hallucinations and may generate text that contains incorrect, irrelevant or misleading information. Since it is very hard to expect the contents of generated text ahead of time, the output of the large language models need to be heavily moderated and curated to avoid undesired content to appear without warning.

Eval results

  • Evaluation was done on test split which was used for validation.
Metric Value (%)
exact_match 59.04
F1 71.04

BibTeX entry and citation info

@misc{gokden2025pldrllmkvgcache,
      title={PLDR-LLMs Learn A Generalizable Tensor Operator That Can Replace Its Own Deep Neural Net At Inference}, 
      author={Burc Gokden},
      year={2025},
      eprint={2502.13502},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.13502}, 
}

@misc{gokden2024pldrllm,
      title={PLDR-LLM: Large Language Model from Power Law Decoder Representations}, 
      author={Burc Gokden},
      year={2024},
      eprint={2410.16703},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2410.16703}, 
}
Downloads last month
19
Safetensors
Model size
81M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for fromthesky/PLDR-LLM-v52-81M-FT-QA-1

Finetuned
(3)
this model

Dataset used to train fromthesky/PLDR-LLM-v52-81M-FT-QA-1

Collection including fromthesky/PLDR-LLM-v52-81M-FT-QA-1

Evaluation results