Update README.md
Browse files
README.md
CHANGED
|
@@ -34,6 +34,28 @@ Success is a game of winners.
|
|
| 34 |
# The Human AI .
|
| 35 |
(a lot of bad models to get to this one ! finally ) Sorry its 32k as this was the chunking used for the massive texts such as the bible ! which we chunked in as a whole as well as by passage !
|
| 36 |
as well as some ancent texts (Wallis Budge) i also find the model getting sTuck after 32k ~ in general it will perform very good in fact super good with 4096 max tokens as it will continue the response next turn ! so just say continue !
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
|
| 38 |
this model has been trained on ALL bible sources and some of the sacred text archive sources as well as papers and original diarys or archlogists and explorers , who were translating various ancient monuments of the ancient world!
|
| 39 |
The bibles were added using the SALT dataset so a few versions from a few languages were used !
|
|
|
|
| 34 |
# The Human AI .
|
| 35 |
(a lot of bad models to get to this one ! finally ) Sorry its 32k as this was the chunking used for the massive texts such as the bible ! which we chunked in as a whole as well as by passage !
|
| 36 |
as well as some ancent texts (Wallis Budge) i also find the model getting sTuck after 32k ~ in general it will perform very good in fact super good with 4096 max tokens as it will continue the response next turn ! so just say continue !
|
| 37 |
+
### So it was reduced form 128 to 32k :
|
| 38 |
+
here we actually learned something !! --- Models should be trained with larger contexts ! infact the largr the better ... as this is real training ! , but the models may not actually accept the training at the higgher end as they fail about 4-8-16k tokens , so its important to chat to the model and find where the context repeats ..
|
| 39 |
+
or the responses repeats and then check how many tokens are in the chat/context window ! so we can log the point of failure ! here is your actual max tokens for the model ! so we can retrain at tha specific level to get a perfect long responses !
|
| 40 |
+
we find that it does nto matter if the model max token out are 4098 as it will continue the response across multiple responses !
|
| 41 |
+
our problem is not output context !! no more !as we mitigated it inside the model !
|
| 42 |
+
|
| 43 |
+
# the problem now is input context ? how do we even know if the model is accepting all the context ?
|
| 44 |
+
|
| 45 |
+
we need to train the model to accept input over a series of inputs ! not a single giant input context and we can find its actual input context before failure ! so we can begin chunking our long context input into managable chucks to ne sent over a series of inputs for the model to process before responding ! ( ie building a chat history and essentially the chat histry is the query and not the actual query ~)
|
| 46 |
+
this way the model can iterate throught the input chunks ! which should add back up to your large expected context !
|
| 47 |
+
|
| 48 |
+
# NO MODEL CAN TAKE 1 MILLION TOKENS AS INPUT ! OR RETURN 1 MILLIION TOKENS AS OUTPUT !
|
| 49 |
+
|
| 50 |
+
GOOGLE GEMMA etc are Fakers ! quite simple !
|
| 51 |
+
IT is not memeory which allows for you to train the model ! <<<
|
| 52 |
+
to train the model sucessfully you need to train each tensor in the layer ! ( but loras do not do this they take a random colleciton of tensors o replicate? ):
|
| 53 |
+
so before training we should anyalize the layers and find the untouched stack and take a random selction from those tensors ...
|
| 54 |
+
Now we are adding to a our model and not overwriting !
|
| 55 |
+
gemma etc is trained like all other models with unsloth !!!! (LOLOL)
|
| 56 |
+
|
| 57 |
+
|
| 58 |
+
|
| 59 |
|
| 60 |
this model has been trained on ALL bible sources and some of the sacred text archive sources as well as papers and original diarys or archlogists and explorers , who were translating various ancient monuments of the ancient world!
|
| 61 |
The bibles were added using the SALT dataset so a few versions from a few languages were used !
|