Contexts are Never Long Enough: Structured Reasoning for Scalable Question Answering over Long Document Sets Paper • 2604.22294 • Published 4 days ago • 10
DavidAU/Qwen3.6-27B-NEO-CODE-Di-IMatrix-MAX-GGUF Image-Text-to-Text • 27B • Updated 2 days ago • 17.8k • 26
Running Featured 202 Gemma 4 WebGPU 🚀 202 Run Gemma 4 locally in-browser on WebGPU w/ Transformers.js
KnowU-Bench: Towards Interactive, Proactive, and Personalized Mobile Agent Evaluation Paper • 2604.08455 • Published 19 days ago • 47
LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory Paper • 2410.10813 • Published Oct 14, 2024 • 16
view article Article Prefill and Decode for Concurrent Requests - Optimizing LLM Performance Apr 16, 2025 • 73