Evaluation reproduction

#1
by NohTow - opened
LightOn AI org

Hey,

PyLate is getting merged in MTEB soon, facilitating evaluation of PyLate models directly using MTEB.
In the mean time, people might want to reproduce the evaluation results, so as for GTE-ModernColBERT, I am sharing a boilerplate to reproduce the results reported in the model card.
The boilerplate can be found here.

Please note that the reported results are with a query length of 256 except for the Pony split, where we used a query length of 32 because bigger query length yields bad results (I am not sure why, this split is a bit odd).

Hey

Look, I tried to train the reason modern Bert is using the following script

https://gist.github.com/NohTow/d563244596548bf387f19fcd790664d3

It does not produce the same result as the published model, for example, in biology, gives ndcg@10: 0.28 i tired 10 times with different settings, and got the same result

Do I miss something?

Sign up or log in to comment