Evaluation reproduction

by NohTow - opened May 23, 2025

LightOn AI org May 23, 2025

Hey,

PyLate is getting merged in MTEB soon, facilitating evaluation of PyLate models directly using MTEB.
In the mean time, people might want to reproduce the evaluation results, so as for GTE-ModernColBERT, I am sharing a boilerplate to reproduce the results reported in the model card.
The boilerplate can be found here.

Please note that the reported results are with a query length of 256 except for the Pony split, where we used a query length of 32 because bigger query length yields bad results (I am not sure why, this split is a bit odd).

abdoelsayed

5 days ago

Hey

Look, I tried to train the reason modern Bert is using the following script

https://gist.github.com/NohTow/d563244596548bf387f19fcd790664d3

It does not produce the same result as the published model, for example, in biology, gives ndcg@10: 0.28 i tired 10 times with different settings, and got the same result

Do I miss something?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment