Introduction
I created a day-trading application with tunable feasibility parameters such as max_pct_move (the maximum allowed percent difference between the predicted fill price and the current live market price) and take_profit_pct (the percent gain at which to take profit) to help evaluate whether a trade is worth executing. This problem matters because, while I knew these parameters were important, I had no way to determine what values they should take across different market scenarios. Large language models provide a promising way to generate contextualized parameter suggestions, but most state-of-the-art LLMs are too computationally heavy to deliver consistent, low-latency responses in environments where decisions must be made in under a minute. Because of this latency and size constraint, I developed trade_tune_llm, a lightweight fine-tuned model designed specifically to generate feasibility parameters quickly and reliably.
Data
The training dataset I used was a synthetic dataset of 2,000 rows that copied the exact format of the data I see in my app. The bulk of the modifications of the dataframe come from the created instruction column that contains the prompt for training. A train/test split of (80/20) was used with a random seed value of 42.
Methodology
I utilized the Low-Rank Adaptaion LoRA method to finetune the base model for this task. I chose to do this after comparing each of the methods we learned in class and saw that LoRA returned the best responses for the task I was training the LLM for. I understand that this method is subject to bias rather than looking at something more conrete such as a benchmark metric. However, I believed that this is justifyable because I am aiming to create this model for a super niche task.
Parameters
LORA_R = 64 LORA_ALPHA = 64 LORA_DROPOUT = .05 lora_config = LoraConfig( r = LORA_R, #the lower dimension of the low-rank matrices lora_alpha = LORA_ALPHA, #scaling factor for the low-rank update lora_dropout = LORA_DROPOUT, #dropout factor to prevent overfitting bias = "none", task_type = "CAUSAL_LM", #set language modeling as task type target_modules = ["q_proj", "v_proj"], #add LoRA modules to every query and value matrix in the attention layers )
Evaluation
Comparison to similar models used in the trading scope.
| Benchmark | Metric Type | Llama-3.2-1B | trade_tune_llm | TinyLlama-1.1B | Qwen/Qwen2.5-0.5B |
|---|---|---|---|---|---|
| GSM8K-CoT | Strict Exact Match | 3.33% | 6.67% | 0.00% | 26.67% |
| Flexible Exact Match | 6.67% | 6.67% | 0.00% | 26.67% | |
| Strict EM StdErr | 0.0333 | 0.0463 | 0.0000 | 0.0821 | |
| LogiQA | Accuracy | 26.67% | 33.33% | 23.33% | 40.00% |
| Accuracy StdErr | 0.0821 | 0.0875 | 0.0785 | 0.0910 | |
| Normalized Accuracy | 43.33% | 43.33% | 30.00% | 46.67% | |
| Norm Accuracy StdErr | 0.0920 | 0.0920 | 0.0851 | 0.0926 | |
| RACE | Accuracy | 40.00% | 26.67% | 43.33% | 40.00% |
| Accuracy StdErr | 0.0909 | 0.0821 | 0.0920 | 0.0910 |
I selected these benchmarks to evaluate how well the model retains broad reasoning capabilities after task-specific fine-tuning, ensuring it remains flexible for multiple use cases. GSM8K-CoT and RACE capture the modelโs mathematical and English comprehension skills, while LogiQA tests its formal logical reasoning and ability to understand structured prompts. Together, these benchmarks offer a holistic view of how effectively the model follows instructions, reasons through problems, interprets nuanced information, and avoids hallucination. This helps confirm that any observed improvements in trading performance stem from increased performance in reasoning ability rather than overfitting to the training data. In summary, after training Llama-3.2-1B into trade_tune llm I expected all benchmarks to decrease except LogiQA which is what happened. However, I did not expect a base model with half the parameters to perform better on that benchmark.
Usage and Intended Uses
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="rheyoampoyo/trade_tune_LLM")
# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("rheyoampoyo/trade_tune_LLM", dtype="auto")
The use cases of this model are purely for Daytrading applications that use the exact same data format. Due to this model being tuned for a very specific task, there aren't many uses for this model outside of the task it was trained on.
Prompt Format
This is the prompt format of the LLM. It uses a formatted string to feed the latest feasbility parameter from the trade history to the prompt.
f"You are an intraday trading coach. Use the context to set feasibility parameters.
Context
- Ticker: {sym}
- Side: {instr}
- OrderType: {otype}{price_str}
- Current max_pct_move: {max_pct_move}
- Current entry_pad_bps: {entry_pad_bps}
- Current stop_pad_bps: {stop_pad_bps}
- Current take_profit_pct: {take_profit_pct}
- Current ttl_minutes: {ttl_minutes}
- Quantity: {qty}
- FilledQuantity: {filledqty}
- RemainingQuantity: {remainingqty}
- Status: {status}
- Entered: {entered}
Task
- Internally explore {n_paths} distinct reasoning paths (self-consistency) then select the single most consistent recommendation.
- Give suggested changes to the current feasibility parameters which are max_pct_move, entry_pad_bps, stop_pad_bps, take_profit_pct, ttl_minutes in the JSON output
- Do NOT reveal your reasoning chains. Return only the final result.
Units
- entry_pad_bps and stop_pad_bps are in basis points (bps).
- max_pct_move and take_profit_pct are percentages.
- ttl_minutes is an integer (minutes).
Output (JSON only; no extra text)
{{
"max_pct_move": <float percent>,
"entry_pad_bps": <int bps>,
"stop_pad_bps": <int bps>,
"take_profit_pct": <float percent>,
"ttl_minutes": <int>,
"rationale": "<2โ4 concise sentences grounded in the provided context/evidence>"
}}"
Expected Output Format
Here is the comparison between the expected and actual outputs of the model. Although the Actual output was not in JSON format I believe that the output is still good at giving suggestions and a basic rationale.
Expected Output
{{ "max_pct_move": 0.589, "entry_pad_bps": 43, "stop_pad_bps": 58, "take_profit_pct": .589, "ttl_minutes": 41, "rationale": "The entry pad and stop pad values were too high, and the take profit percentage was too low. I reduced the entry pad and stop pad values to 43 and 58,respectively, and increased the take profit percentage to 0.589. The result is a more aggressive trade with a higher probability of success." }}
Actual Output Example
A: 0.589, 43, 58, 0.589, 41, "max_pct_move: 0.589; entry_pad_bps: 43; stop_pad_bps: 58; take_profit_pct: 0.589; ttl_minutes: 41; rationale: 'The entry pad and stop pad values were too high, and the take profit percentage was too low. I reduced the entry pad and stop pad values to 43 and 58,respectively, and increased the take profit percentage to 0.589. The result is a more aggressive trade with a higher probability of success.'"
Limitations
The main limitation of this model is, practically everything that is not used for a Day Trading app. Practically all of the benchmarks that I used all had decreased accuracy post-training. When looking at the model responses, I noticed that the reponses couldn't get into the exact format (JSON), however it did give suggestions which was the main goal. As far as the rationale, it doesn't give a lot of detail rather than
- PEFT 0.17.1
- Downloads last month
- 63
Model tree for rheyoampoyo/trade_tune_LLM
Base model
meta-llama/Llama-3.2-1B