Fix runtime buffers after load
#37
by err805 - opened
On newer versions of transformers, loading the model can leave the non-persistent runtime buffers in an invalid state, which causes detect() to return incorrect results.
This rebuilds attn_mask and text.freqs_cis after from_pretrained() so the model uses the expected runtime buffers on both older and newer transformers versions.
vikhyatk changed pull request status to merged