Spaces:

akhaliq
/

anycoder

Running

akhaliq HF Staff commited on 24 days ago

Commit

4748143

1 Parent(s): c77d732

Comprehensive streaming fix: Eliminate all buffering sources

Issue: Code streaming still not working - chunks arriving but not displaying
Root causes identified:
1. React 18 automatic batching delays state updates
2. Potential proxy/nginx buffering at various levels
3. DeepSeek using InferenceClient instead of OpenAI client

Complete fix across full stack:

Backend (backend_api.py):
- Convert DeepSeek to use OpenAI client for consistent streaming
- Add comprehensive anti-buffering headers:
* Cache-Control: no-cache, no-transform
* Content-Encoding: none
* Transfer-Encoding: chunked
* X-Accel-Buffering: no (prevents nginx buffering)
- Add logging for MiniMax and DeepSeek initialization
- All models now use OpenAI client (OpenRouter, MiniMax, DeepSeek)

Frontend (page.tsx):
- Import flushSync from react-dom
- Wrap setGeneratedCode in flushSync() to bypass React 18 batching
- Forces immediate synchronous DOM updates for each chunk
- No more delayed/batched state updates

How streaming works now:
1. Backend: LLM generates chunk → yield immediately (no buffer)
2. HTTP: Headers prevent all proxy/nginx buffering
3. Frontend: SSE receives chunk → flushSync forces immediate React update
4. Monaco: Editor updates character-by-character in real-time

This ensures true streaming where every single chunk appears immediately
as it's generated, with zero buffering at any layer of the stack.

Files changed (2) hide show

backend_api.py +12 -4
frontend/src/app/page.tsx +9 -6

backend_api.py CHANGED Viewed

@@ -412,9 +412,15 @@ async def generate_code(
                 )
                 # Add :novita suffix for the API call
                 actual_model_id = "MiniMaxAI/MiniMax-M2:novita"
             elif actual_model_id.startswith("deepseek-ai/"):
-                # DeepSeek models via HuggingFace
-                client = InferenceClient(token=os.getenv("HF_TOKEN"))
             elif actual_model_id == "qwen3-max-preview":
                 # Qwen via DashScope (would need separate implementation)
                 # For now, fall back to HF
@@ -499,9 +505,11 @@ async def generate_code(
         event_stream(),
         media_type="text/event-stream",
         headers={
-            "Cache-Control": "no-cache",
             "Connection": "keep-alive",
-            "X-Accel-Buffering": "no"
         }
     )

                 )
                 # Add :novita suffix for the API call
                 actual_model_id = "MiniMaxAI/MiniMax-M2:novita"
+                print(f"[Generate] Using HuggingFace router for MiniMax M2")
             elif actual_model_id.startswith("deepseek-ai/"):
+                # DeepSeek models via HuggingFace - use OpenAI client for better streaming
+                from openai import OpenAI
+                client = OpenAI(
+                    base_url="https://api-inference.huggingface.co/v1",
+                    api_key=os.getenv("HF_TOKEN")
+                )
+                print(f"[Generate] Using HuggingFace Inference API for DeepSeek")
             elif actual_model_id == "qwen3-max-preview":
                 # Qwen via DashScope (would need separate implementation)
                 # For now, fall back to HF
         event_stream(),
         media_type="text/event-stream",
         headers={
+            "Cache-Control": "no-cache, no-transform",
             "Connection": "keep-alive",
+            "X-Accel-Buffering": "no",
+            "Content-Encoding": "none",
+            "Transfer-Encoding": "chunked"
         }
     )

frontend/src/app/page.tsx CHANGED Viewed

@@ -1,6 +1,7 @@
 'use client';
 import { useState, useEffect } from 'react';
 import Header from '@/components/Header';
 import ChatInterface from '@/components/ChatInterface';
 import CodeEditor from '@/components/CodeEditor';
@@ -78,14 +79,16 @@ export default function Home() {
     try {
       apiClient.generateCodeStream(
         request,
-        // onChunk - Update code editor in real-time
         (chunk: string) => {
           console.log('[Stream] Received chunk:', chunk.substring(0, 50), '... (length:', chunk.length, ')');
-          // Use functional update to ensure we always append to latest state
-          setGeneratedCode((prevCode) => {
-            const newCode = prevCode + chunk;
-            console.log('[Stream] Total code length:', newCode.length);
-            return newCode;
           });
         },
         // onComplete

 'use client';
 import { useState, useEffect } from 'react';
+import { flushSync } from 'react-dom';
 import Header from '@/components/Header';
 import ChatInterface from '@/components/ChatInterface';
 import CodeEditor from '@/components/CodeEditor';
     try {
       apiClient.generateCodeStream(
         request,
+        // onChunk - Update code editor in real-time with immediate flush
         (chunk: string) => {
           console.log('[Stream] Received chunk:', chunk.substring(0, 50), '... (length:', chunk.length, ')');
+          // Use flushSync to force immediate DOM update without React batching
+          flushSync(() => {
+            setGeneratedCode((prevCode) => {
+              const newCode = prevCode + chunk;
+              console.log('[Stream] Total code length:', newCode.length);
+              return newCode;
+            });
           });
         },
         // onComplete