LeroyDyer commited on
Commit
dccca90
·
verified ·
1 Parent(s): 3ae0b46

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -0
README.md CHANGED
@@ -34,6 +34,28 @@ Success is a game of winners.
34
  # The Human AI .
35
  (a lot of bad models to get to this one ! finally ) Sorry its 32k as this was the chunking used for the massive texts such as the bible ! which we chunked in as a whole as well as by passage !
36
  as well as some ancent texts (Wallis Budge) i also find the model getting sTuck after 32k ~ in general it will perform very good in fact super good with 4096 max tokens as it will continue the response next turn ! so just say continue !
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
 
38
  this model has been trained on ALL bible sources and some of the sacred text archive sources as well as papers and original diarys or archlogists and explorers , who were translating various ancient monuments of the ancient world!
39
  The bibles were added using the SALT dataset so a few versions from a few languages were used !
 
34
  # The Human AI .
35
  (a lot of bad models to get to this one ! finally ) Sorry its 32k as this was the chunking used for the massive texts such as the bible ! which we chunked in as a whole as well as by passage !
36
  as well as some ancent texts (Wallis Budge) i also find the model getting sTuck after 32k ~ in general it will perform very good in fact super good with 4096 max tokens as it will continue the response next turn ! so just say continue !
37
+ ### So it was reduced form 128 to 32k :
38
+ here we actually learned something !! --- Models should be trained with larger contexts ! infact the largr the better ... as this is real training ! , but the models may not actually accept the training at the higgher end as they fail about 4-8-16k tokens , so its important to chat to the model and find where the context repeats ..
39
+ or the responses repeats and then check how many tokens are in the chat/context window ! so we can log the point of failure ! here is your actual max tokens for the model ! so we can retrain at tha specific level to get a perfect long responses !
40
+ we find that it does nto matter if the model max token out are 4098 as it will continue the response across multiple responses !
41
+ our problem is not output context !! no more !as we mitigated it inside the model !
42
+
43
+ # the problem now is input context ? how do we even know if the model is accepting all the context ?
44
+
45
+ we need to train the model to accept input over a series of inputs ! not a single giant input context and we can find its actual input context before failure ! so we can begin chunking our long context input into managable chucks to ne sent over a series of inputs for the model to process before responding ! ( ie building a chat history and essentially the chat histry is the query and not the actual query ~)
46
+ this way the model can iterate throught the input chunks ! which should add back up to your large expected context !
47
+
48
+ # NO MODEL CAN TAKE 1 MILLION TOKENS AS INPUT ! OR RETURN 1 MILLIION TOKENS AS OUTPUT !
49
+
50
+ GOOGLE GEMMA etc are Fakers ! quite simple !
51
+ IT is not memeory which allows for you to train the model ! <<<
52
+ to train the model sucessfully you need to train each tensor in the layer ! ( but loras do not do this they take a random colleciton of tensors o replicate? ):
53
+ so before training we should anyalize the layers and find the untouched stack and take a random selction from those tensors ...
54
+ Now we are adding to a our model and not overwriting !
55
+ gemma etc is trained like all other models with unsloth !!!! (LOLOL)
56
+
57
+
58
+
59
 
60
  this model has been trained on ALL bible sources and some of the sacred text archive sources as well as papers and original diarys or archlogists and explorers , who were translating various ancient monuments of the ancient world!
61
  The bibles were added using the SALT dataset so a few versions from a few languages were used !