Text Generation
Safetensors
Danish
English
llama

Added confidence interval calculation

#6
by giannor - opened
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -43,7 +43,9 @@ and two models from the Pleias family ([Pleias-350M-Preview](https://huggingface
43
  All comparison models were trained exclusively on open data, either in the public domain or under a permissive license.
44
 
45
  The following tables show the performance on each dataset.
46
- For each, we report the respective main metric from EuroEval and the confidence interval.
 
 
47
 
48
  | Model | scala-da (MCC)| dala (MCC) | angry-tweets (MCC) | dansk (Micro F1, No Misc) | danske-talemaader (MCC) | danish-citizen-tests (MCC) | multi-wiki-qa-da (F1) | hellaswag-da (MCC) | nordjylland-news (BERTScore) | average |
49
  | ----------------------------------- | ------------- | ------------- | ------------------ | ------------------------- | ----------------------- | -------------------------- | --------------------- | ------------------ | ---------------------------- | ------- |
 
43
  All comparison models were trained exclusively on open data, either in the public domain or under a permissive license.
44
 
45
  The following tables show the performance on each dataset.
46
+ For each, we report the respective main metric from EuroEval and the confidence interval.
47
+ The latter is calculated as the mean of the metric scores across all evaluation runs ± 1.96 times the standard error of the mean:
48
+ $$\hat{\mu} \pm 1.96 \times SEM \quad \textrm{where} \quad SEM = \frac{s}{\sqrt{n}} \quad \textrm{and} \quad s = \sqrt{\frac{\sum_{i=1}^{n} (x_i - \hat{\mu})^2}{n-1}} \quad \textrm{and} \quad \hat{\mu} = \frac{1}{n} \sum_{i=1}^{n} x_i$$
49
 
50
  | Model | scala-da (MCC)| dala (MCC) | angry-tweets (MCC) | dansk (Micro F1, No Misc) | danske-talemaader (MCC) | danish-citizen-tests (MCC) | multi-wiki-qa-da (F1) | hellaswag-da (MCC) | nordjylland-news (BERTScore) | average |
51
  | ----------------------------------- | ------------- | ------------- | ------------------ | ------------------------- | ----------------------- | -------------------------- | --------------------- | ------------------ | ---------------------------- | ------- |