Article
Ken Tsui
kenhktsui
AI & ML interests
ML engineer, researcher
VLM, LLM benchmark
Opinions are my own
Recent Activity
liked
a dataset
16 days ago
Hothan/OlympiadBench
liked
a dataset
20 days ago
mixture-vitae-backup/MixtureVitae-2TT
upvoted
a
paper
about 2 months ago
Diffusion Transformers with Representation Autoencoders
Organizations
FastText Model for Pretraining Data Curation
-
kenhktsui/llm-data-textbook-quality-fasttext-classifier-v2
Text Classification • Updated • 222 • 28 -
kenhktsui/fineweb-edu-fasttext-classifier
Text Classification • Updated • 37 • 4 -
kenhktsui/code-natural-language-fasttext-classifier
Text Classification • Updated • 209 • 3 -
kenhktsui/math-fasttext-classifier
Text Classification • Updated • 28 • 2
LongTalk
A Very Long Chain-of-Thought Dataset for Reasoning Model Post-Training
-
kenhktsui/longtalk-cot-v0.1
Viewer • Updated • 61.2k • 126 • 13 -
kenhktsui/qwen2.5-7b-instruct-thinking-sft-merged-gguf
8B • Updated • 41 • 1 -
kenhktsui/qwen2.5-7b-instruct-thinking-sft-merged
Text Generation • 8B • Updated • 12 -
kenhktsui/llama3.1-8b-instruct-thinking-sft-merged-gguf
8B • Updated • 32 • 1
FastText Model for Pretraining Data Curation
-
kenhktsui/llm-data-textbook-quality-fasttext-classifier-v2
Text Classification • Updated • 222 • 28 -
kenhktsui/fineweb-edu-fasttext-classifier
Text Classification • Updated • 37 • 4 -
kenhktsui/code-natural-language-fasttext-classifier
Text Classification • Updated • 209 • 3 -
kenhktsui/math-fasttext-classifier
Text Classification • Updated • 28 • 2
LongTalk
A Very Long Chain-of-Thought Dataset for Reasoning Model Post-Training
-
kenhktsui/longtalk-cot-v0.1
Viewer • Updated • 61.2k • 126 • 13 -
kenhktsui/qwen2.5-7b-instruct-thinking-sft-merged-gguf
8B • Updated • 41 • 1 -
kenhktsui/qwen2.5-7b-instruct-thinking-sft-merged
Text Generation • 8B • Updated • 12 -
kenhktsui/llama3.1-8b-instruct-thinking-sft-merged-gguf
8B • Updated • 32 • 1
models
34
kenhktsui/math-fasttext-classifier
Text Classification
•
Updated
•
28
•
2
kenhktsui/code-natural-language-fasttext-classifier
Text Classification
•
Updated
•
209
•
3
kenhktsui/fineweb-edu-fasttext-classifier
Text Classification
•
Updated
•
37
•
4
kenhktsui/llm-data-textbook-quality-fasttext-classifier-v2
Text Classification
•
Updated
•
222
•
28
kenhktsui/finefineweb-domain-fasttext-classifier
Text Classification
•
Updated
•
2.17k
•
1
kenhktsui/Qwen2.5-3B-Instruct-GRPO-basic-sampling_temp_05
Text Generation
•
Updated
•
4
kenhktsui/Qwen2.5-3B-Instruct-GRPO-minp-sampling_temp_05
Text Generation
•
Updated
•
10
kenhktsui/Qwen-0.5B-GRPO
Text Generation
•
0.5B
•
Updated
•
5
•
1
kenhktsui/Qwen-0.5B-GRPO-gsm8k-count-wait-cap-cross-correct
Text Generation
•
0.5B
•
Updated
•
11
kenhktsui/llama3.1-8b-instruct-thinking-sft-merged-gguf
8B
•
Updated
•
32
•
1
datasets
48
kenhktsui/FineFineWeb-First100K
Viewer
•
Updated
•
6.7M
•
195
kenhktsui/serp-bench
Updated
•
7
kenhktsui/math-classifiers-data
Viewer
•
Updated
•
2M
•
503
kenhktsui/longtalk-cot-v0.1
Viewer
•
Updated
•
61.2k
•
126
•
13
kenhktsui/code-natural-language-classification-dataset
Viewer
•
Updated
•
4.05M
•
3.88k
kenhktsui/github-code-permissive-sample
Viewer
•
Updated
•
3.21M
•
3.26k
kenhktsui/llm-data-textbook-quality-v2
Viewer
•
Updated
•
1.01M
•
122
kenhktsui/test_imdb
Viewer
•
Updated
•
40
•
17
kenhktsui/test_twitter_financial_news
Viewer
•
Updated
•
60
•
28
kenhktsui/test_ag_news
Viewer
•
Updated
•
104
•
48