Post
91
🧠 I trained a French LLM from scratch. Alone. On a 1080 Ti. And honestly… it was a lot.
4 months building the dataset before even touching the model. Custom crawler, custom extractor, custom BPE tokenizer, everything from zero. Then the architecture — RoPE, RMSNorm, SwiGLU, Flash Attention. Then a 3-phase trainer. Then debugging a causal mask bug that made the model generate "ïsïsïs" for hours.
Then the power went out at epoch 10/18.
The checkpoint survived. The model learned form perfectly — grammar, markdown, structure. Substance? Still working on it. Honest conclusion.
Full write-up here 👇
🔗 https://huggingface.co/blog/RDTvlokip/i-trained-my-own-french-llm-from-scratch
4 months building the dataset before even touching the model. Custom crawler, custom extractor, custom BPE tokenizer, everything from zero. Then the architecture — RoPE, RMSNorm, SwiGLU, Flash Attention. Then a 3-phase trainer. Then debugging a causal mask bug that made the model generate "ïsïsïs" for hours.
Then the power went out at epoch 10/18.
The checkpoint survived. The model learned form perfectly — grammar, markdown, structure. Substance? Still working on it. Honest conclusion.
Full write-up here 👇
🔗 https://huggingface.co/blog/RDTvlokip/i-trained-my-own-french-llm-from-scratch