Output not respecting lines breaks?
The model doesn't seem to output line breaks when it should (especially obvious in code and point form)
Here's an example.. There should be a lot more line breaks
Here's a list of things I can help you with:
Information & Research
- Answer questions on countless topics (science, history, culture, etc.)
- Explain complex concepts in simple terms- Summarize articles, books, or documents- Provide definitions and background information- Compare options, products, or ideasWriting & Content Creation
- Write essays, articles, stories, poems, or scripts- Draft emails, letters, or professional documents- Brainstorm ideas and outlines- Edit and proofread text- Generate creative content (jokes, slogans, social media posts)
Analysis & Problem-Solving
- Break down problems step-by-step- Analyze data, arguments, or situations- Help with math problems and logic puzzles- Offer decision-making frameworks- Identify patterns or key points in informationProgramming & Technical Help
- Write code in many programming languages- Debug and explain code errors- Explain technical concepts- Help with database queries, algorithms, and system design- Create scripts for automationLearning & Education
- Tutor subjects like math, physics, languages, etc.
- Create study guides and practice questions- Explain homework problems- Translate languages and explain grammar- Adapt explanations to your learning levelProductivity & Organization
- Create to-do lists and action plans- Help prioritize tasks- Format and structure documents- Generate templates (resumes, meeting agendas, etc.)
- Summarize meeting notesCreative & Fun
- Brainstorm creative ideas (names, plots, business ideas)
- Play text-based games or roleplay scenarios- Generate art prompts and creative briefs- Tell jokes, stories, or trivia- Help with worldbuilding for writing or gamesBusiness & Professional
- Draft business proposals and reports- Analyze business scenarios- Create marketing copy and product descriptions- Help with interview preparation- Write professional communicationsPersonal & Practical
- Offer advice on common situations (travel, etiquette, etc.)
- Help plan trips or daily schedules- Suggest recipes, workouts, or hobby ideas- Provide basic health and wellness information (not medical advice)
- Simulate conversations for practiceWhat I Cannot Do:
- Access real-time information or browse the internet- Provide medical, legal, or financial advice- Have personal experiences or consciousness- Execute code or access external systems- Guarantee accuracy for rapidly changing informationLet me know what you'd like help with!<|im_end|>
This is the only model that has this issue. Others format perfectly. Also has random missing spaces in output as well
Interesting. I just recently noticed missing spaces in output of some GLM-4.5-Air AWQ quants running in vllm. Unrelated likely 😀, but I cannot remember seeing this issue before.
Someone on reddit was asking about that too, i suggest they check the updated model card and use all the --special --jinja as well as passing in the updated jinja template from the official repo which is linked in model card.
https://www.reddit.com/r/LocalLLaMA/comments/1oqo57j/comment/nnze8cz/
If you are still having trouble, then provide your full command as well as some info on what client you are using and what endpoint you're hitting e.g. /v1/chat/completions etc.
Did not help
~/ik_llama.cpp/build/bin/llama-server --model /mnt/shared/gguf/ubergarm/Kimi-K2-Thinking-GGUF/Q8_0-Q4_0/Kimi-K2-Thinking-Q8_0-Q4_0-00001-of-00013.gguf --numa distribute --alias Kimi-K2 --threads -1 --cache-type-k q8_0 --cache-type-v q8_0 --temp 0.6 --ctx-size 160000 --prompt-cache --parallel=1 --metrics -mla 3 -fa -fmoe -amb 512 --seed 3407 --host 0.0.0.0 --port 10020 -cb -ot ".ffn_.*_exps.=CPU" --n-gpu-layers 99 --jinja --special --chat-template-file /mnt/shared/gguf/ubergarm/Kimi-K2-Thinking-GGUF/Q8_0-Q4_0/chat_template.jinja
==========
Here's a comprehensive list of things I can help you with:
Information & Learning
- Answer questions on countless topics (science, history, technology, arts, culture, etc.)
- Explain complex concepts in simple terms- Summarize articles, books, or long texts- Provide definitions, examples, and analogies- Discuss ideas and explore different perspectivesWriting & Content Creation
- Write essays, articles, reports, emails, and letters- Create creative content (stories, poems, scripts, song lyrics)
- Generate marketing copy, social media posts, or product descriptions- Brainstorm ideas for projects, stories, or businesses- Edit, proofread, and improve your writing- Adjust tone and style (formal, casual, persuasive, technical)
Analysis & Problem-Solving
- Solve math problems from basic arithmetic to advanced calculus- Write, debug, and explain code in dozens of programming languages- Analyze data, create charts, and interpret statistics- Break down complex problems into step-by-step solutions- Help with logic puzzles, riddles, and strategic thinkingTask & Planning Assistance
- Create detailed plans, outlines, and schedules- Help with project management and organization- Offer advice on career, relationships, and personal development- Generate to-do lists and goal-setting frameworks- Simulate scenarios (job interviews, negotiations, conversations)
Language & Communication
- Translate between dozens of languages- Practice conversation skills in different languages- Rephrase sentences for clarity or different audiences- Help with grammar, vocabulary, and language learning- Draft professional or personal messagesCreative & Entertainment
- Tell stories, jokes, and creative tales- Play text-based games or roleplay scenarios- Generate ideas for parties, gifts, events, or creative projects- Create character backstories, worldbuilding, and plot ideas- Discuss books, movies, music, and artTechnical & Professional Support
- Explain technical concepts and documentation- Create templates for resumes, cover letters, and business documents- Help with coding projects and algorithm design- Provide study guides and tutoring on academic subjects- Simulate professional situations for practicePractical Utilities
- Format information into tables, lists, or structured outlines- Convert between file formats (e.g., CSV to JSON, text to HTML)
- Generate recipes, travel itineraries, or workout plans- Create flashcards, quizzes, and study materials- Help with decision-making (pros/cons analysis)
Important Notes:
- My knowledge is current up to April2024 and I don't have real-time internet access- I can't execute code, access files, or browse websites- I don't have personal experiences, emotions, or consciousness- I can make mistakes—please verify critical information independently- I won't generate harmful, illegal, or explicit contentWhat would you like to explore first?<|im_end|>
Well, first make sure to pull and re-build the latest tip of main for ik_llama.cpp which no longer needs you to pass -fa -fmoe as they default to enabled.
Second, what is the client you are using? Is this the built in web client of llama-server or you using a 3rd party tool like openwebui (just want to make sure it isn't some kind of markdown rendering issue).
I've tried the latest version of ikl and multiple clients (OWU, cline, roo) and K2 is the only model that has this issue with line breaks. It's basically unusable for coding because all the code is printed on the same line. This has to be a parsing/template issue?
Which template are you using? Did you try this one that got updated a couple weeks ago for Kimi-K2-Thinking in ik_llama.cpp?
https://github.com/ikawrakow/ik_llama.cpp/blob/main/models/templates/Kimi-K2-Thinking.jinja
This template makes it worse. Just dumping the parameters now:
--temp0.6 --top-p0.95 --min-p0.0 --top-k40 --repeat-penalty1.1 --repeat-last-n64 --seed -1 --batch-size2048 --ctx-size8192 --n-gpu-layers81 --threads16 --parallel4 --cont-batching --flash-attn --mlock --metrics --host0.0.0.0 --port8080 --log-format text --log-disable```
##模型信息- 模型名称: Kimi-K2-Thinking- 模型类型:32B参数模型- 量化格式: Q8_0和 Q
prompt cache save took 19.12 ms
- looking for better prompt, base f_keep = 0.018, sim = 0.036, n_keep = 0, nndiscarded_prompt = 0
- cache state: 1 prompts, 67.553 MiB (limits: 8192.000 MiB, 0 tokens, 230044 est)
- prompt 0x5a01b6b0bba0: 1897 tokens, 0 discarded, checkpoints: 067.553 MiB
prompt cache load took 0.01 ms
INFO [ launch_slot_with_task] slot is processing task | tid="130431525793792" timestamp=1765100925 id_slot=0 id_task=1732
======== Cache: cache_size = 1897, n_past0 = 35, n_past1 = 35, n_past_prompt1 = 35, n_past2 = 35, n_past_prompt2 = 35
Common part does not match fully
cache : -Thinking.jinja
--temp 0.6
--top-p 0.95
--min-p 0.0
prompt: -Thinking.jinja
INFO [ update_slots] we have to evaluate at least 1 token to generate logits | tid="130431525793792" timestamp=1765100925 id_slot=0 id_task=1732
INFO [ update_slots] kv cache rm [p0, end) | tid="130431525793792" timestamp=1765100925 id_slot=0 id_task=1732 p0=34
^C^CReceived second interrupt, terminating immediately.
So sounds like you are still experiencing your original issue? Here is what you said at the top:
The model doesn't seem to output line breaks when it should (especially obvious in code and point form)
What are you expecting exactly? Are you expecting \r\n or just \n and where is it happening given this new Kimi-K2-Thinking interleaves thinking during the chat and expects that in the context which is different than previous models.
I've noticed if I use my old client (little vibe coded project) the thinking blocks are not the same as they used to be and are on other models, so your client has to be updated enough to handle this new interleaved format.
My impression is that this model is working well for other folks, but not sure exactly which chat templates they are using and exactly how they are running it.
I'll ask for you over on this thread where they had good success with the official vendor verifier: https://huggingface.co/ubergarm/Kimi-K2-Instruct-GGUF/discussions/8#693004e93e3c257844715f13
I don't even think there's \n it just combines the entire output in the same line multiple times throughout the output. Both in thinking and outside of thinking. This is the only model that does it (K2 variants).
@justj0sh do you use any proxy servers in between ik_llama and your client? What operating system are you running?
In your first copy/paste there is something incorrect with the chat template or the client parsing the content:
explore first?<|im_end|>