Update README.md
Browse files
README.md
CHANGED
|
@@ -113,10 +113,9 @@ This model is part of a collection of LayerNorm-free models. The table below pro
|
|
| 113 |
|
| 114 |
## Citation
|
| 115 |
|
| 116 |
-
|
| 117 |
-
|
| 118 |
-
**BibTeX:**
|
| 119 |
|
|
|
|
| 120 |
@misc{gpt2layernorm2025,
|
| 121 |
author = {Baroni, Luca and Khara, Galvin and Schaeffer, Joachim and Subkhankulov, Marat and Heimersheim, Stefan},
|
| 122 |
title = {Transformers Don't Need LayerNorm at Inference Time: Scaling LayerNorm Removal to GPT-2 XL and the Implications for Mechanistic Interpretability},
|
|
@@ -126,3 +125,4 @@ Title: *Transformers Don’t Need LayerNorm at Inference Time: Scaling LayerNorm
|
|
| 126 |
primaryClass = {cs.LG},
|
| 127 |
url = {https://arxiv.org/abs/2507.02559v1}
|
| 128 |
}
|
|
|
|
|
|
| 113 |
|
| 114 |
## Citation
|
| 115 |
|
| 116 |
+
If you have found our work useful please cite as:
|
|
|
|
|
|
|
| 117 |
|
| 118 |
+
```
|
| 119 |
@misc{gpt2layernorm2025,
|
| 120 |
author = {Baroni, Luca and Khara, Galvin and Schaeffer, Joachim and Subkhankulov, Marat and Heimersheim, Stefan},
|
| 121 |
title = {Transformers Don't Need LayerNorm at Inference Time: Scaling LayerNorm Removal to GPT-2 XL and the Implications for Mechanistic Interpretability},
|
|
|
|
| 125 |
primaryClass = {cs.LG},
|
| 126 |
url = {https://arxiv.org/abs/2507.02559v1}
|
| 127 |
}
|
| 128 |
+
```
|