Update README.md
Browse files
README.md
CHANGED
|
@@ -106,5 +106,27 @@ The benchmarks and metrics used are identical to those in the [Phi-3 technical r
|
|
| 106 |
|MT Bench|2R. Avg.|8.38|8.7|-|6.77|-19.21%|-22.18%|-|
|
| 107 |
||||||**Average**|**-10.09%**|**-13.45%**|**+10.18%**|
|
| 108 |
|
| 109 |
-
|
| 110 |
\*: We were unable to find an evaluation framework for this benchmark.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 106 |
|MT Bench|2R. Avg.|8.38|8.7|-|6.77|-19.21%|-22.18%|-|
|
| 107 |
||||||**Average**|**-10.09%**|**-13.45%**|**+10.18%**|
|
| 108 |
|
|
|
|
| 109 |
\*: We were unable to find an evaluation framework for this benchmark.
|
| 110 |
+
|
| 111 |
+
### Comparsion to Gemma
|
| 112 |
+
|
| 113 |
+
#### Gemma 1 & 2
|
| 114 |
+
The benchmarks and metrics used are identical to those in the [Gemma 2 technical report](https://arxiv.org/abs/2408.00118).
|
| 115 |
+
|
| 116 |
+
#### Gemma 3
|
| 117 |
+
The benchmarks and metrics used are identical to those in the [Gemma 3 technical report](https://arxiv.org/abs/2503.19786).
|
| 118 |
+
|
| 119 |
+
|Benchmark|Metric|Gemma 3 1B|Gemma 3 4B|Motif 2.6B|Improvement(over 1B)|Improvement(over 4B)|
|
| 120 |
+
|---|---|---|---|---|---|---|
|
| 121 |
+
|MMLU-Pro|5-shot|14.7|43.6|-|-|-|
|
| 122 |
+
|LiveCodeBench*|-|1.9|12.6|-|-|-|
|
| 123 |
+
|Bird-SQL(dev)\*|-|6.4|36.3|-|-|-|
|
| 124 |
+
|GPQA Diamond|5-shot|19.2|30.8|31.81|+65.68%|+3.28%|
|
| 125 |
+
|SimpleQA*|-|2.2|4|-|-|-|
|
| 126 |
+
|FACTS Grounding*|-|36.4|70.1|-|-|-|
|
| 127 |
+
|MATH|4-shot|48|75.6|40.2|-16.25%|-46.83%|
|
| 128 |
+
|HiddenMath*|-|15.8|43|-|-|-|
|
| 129 |
+
|MMLU(val)|5-shot|-|48.8|57.93|-|+18.71%|
|
| 130 |
+
|||||**Average**|+24.71%|-8.28%|
|
| 131 |
+
|
| 132 |
+
\*: We were unable to find an evaluation framework for this benchmark.
|