Spaces:
Runtime error
QWEN2Golem β Deployment Guide
Architecture
- Frontend: Next.js 15 (App Router, TypeScript). Uses server actions and API routes under
src/app/api/*. Genkit flows live insrc/ai/flows/*and initialize viasrc/ai/genkit.ts(Google AI plugin). - Backend: Flask app
home/chezy/golem_flask_server.pyon port 5000. Provides chat/generate, state/consciousness-state, ASR/asr/transcribe, TTS/tts/synthesize, health/health, and others. - Model/AI:
- Frontend Genkit flows target
googleai/gemini-2.0-flash. - Backend orchestrates generation (Gemini via key if present) and optional Google Custom Search.
- Frontend Genkit flows target
- API Bridge: Next API routes proxy to the Flask backend:
src/app/api/generate/route.tsβ POST{GOLEM}/generatesrc/app/api/consciousness-state/route.tsβ GET{GOLEM}/consciousness-statesrc/app/api/asr/transcribe/route.tsβ POST{BACKEND}/asr/transcribesrc/app/api/tts/synthesize/route.tsβ POST{BACKEND}/tts/synthesize
- Training service (separate backend): Exposed via
TRAINING_API_BASE(FastAPI/Flask), replacing any localhost usage. - Containers:
Dockerfile.frontend(Node 20 runtime, port 9002) andDockerfile.golem(Python 3.11, port 5000).
Prerequisites
- Frontend host: Vercel (recommended;
vercel.jsonpresent) or Netlify (works; configure build/env). - GPU host for backend: RunPod (or any CUDA-enabled VM). Use an image with CUDA (e.g.,
runpod/pytorch:*with CUDA 12.x). - Keys:
- Google AI Studio API key for Genkit (frontend server-side flows).
- Google Custom Search:
GOOGLE_API_KEYandGOOGLE_CSE_IDfor backend search. - Optional Gemini on backend:
GEMINI_API_KEY(backend also checksNEXT_PUBLIC_GEMINI_API_KEY).
- Domain or tunnel: Public HTTPS base URL for backend (RunPod proxy, Cloudflare Tunnel, or ngrok). See
setup-tunnel.md.
Backend (GPU) Setup on RunPod
- Create a GPU pod (CUDA 12.x). Expose port 5000 publicly.
- SSH/shell into the pod, clone repo, create venv, install deps:
sudo apt-get update -y && sudo apt-get install -y python3-venv git
git clone https://your.repo/QWEN2Golem.git && cd QWEN2Golem
python3 -m venv .venv && source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
- Set environment variables:
export GOOGLE_API_KEY=YOUR_GOOGLE_AI_STUDIO_KEY
export GOOGLE_CSE_ID=YOUR_GOOGLE_CSE_ID
export GEMINI_API_KEY=OPTIONAL_BACKEND_GEMINI_KEY
export CUDA_VISIBLE_DEVICES=0
- Start the server (Gunicorn recommended):
gunicorn -b 0.0.0.0:5000 --workers 1 --timeout 60 home.chezy.golem_flask_server:app
- Verify:
curl -sS http://localhost:5000/health
- Key endpoints exposed by the backend:
- POST
/generate, POST/generate/stream - GET
/consciousness-state, POST/set-consciousness-dimension - POST
/asr/transcribe, POST/tts/synthesize - GET
/health, GET/status
- POST
If hosting behind a tunnel (Cloudflare/ngrok), the same steps apply; use the tunnel URL for the frontend.
Training backend (required to avoid mock responses)
You must host a real training service at a public base URL and configure the frontend to use it. The following endpoints are expected on the training service (FastAPI/Flask suggested):
- POST
/api/training/start-auto-training - GET
/api/training/status/{training_job_id} - POST
/api/ai-architect/generate-architecture - GET/POST
/api/ai-architect/*(workflow, questionnaire, continue, notebook-html, artifacts, deployer, etc.)
Set the base URL via environment variables (see below). All hardcoded localhost usages have been removed in favor of env-configured URLs.
Frontend Setup (Vercel recommended)
- Connect repo to Vercel.
- Environment variables (Project Settings β Environment Variables):
NEXT_PUBLIC_GOLEM_SERVER_URL = https://<public-backend-base> # Flask backend (port 5000)
NEXT_PUBLIC_BACKEND_URL = https://<public-backend-base> # Same as above
GOOGLE_API_KEY = <google-ai-studio-key> # Genkit plugin
NEXT_PUBLIC_GEMINI_API_KEY = <optional-display-or-shared-key>
TRAINING_API_BASE = https://<public-training-base> # Server-side Next API routes
NEXT_PUBLIC_TRAINING_API_BASE= https://<public-training-base> # Client-side components
Build and deploy (Vercel picks
npm run buildpervercel.json). Long-running function settings already exist invercel.json.Test the app once deployed.
Netlify alternative
- Build command:
npm run build - Publish directory:
.next - Same environment variables as above (Site settings β Environment variables).
- Ensure Next serverless/edge functions are enabled per Netlify plan. If using Netlify, you may choose to run the frontend as a static export plus purely client-side calls to the backend endpoints.
Genkit flows
Genkit is initialized in src/ai/genkit.ts with @genkit-ai/googleai and default model googleai/gemini-2.0-flash.
- Ensure the environment has a valid Google AI Studio key (e.g.,
GOOGLE_API_KEY) for Genkit server-side usage. - Flows include:
src/ai/flows/get-initial-zpe-analysis-flow.tssrc/ai/flows/golem-chat.ts(frontend calls backend/generateby default)
Environment variables (summary)
Frontend:
NEXT_PUBLIC_GOLEM_SERVER_URL: HTTPS base of Flask backendNEXT_PUBLIC_BACKEND_URL: same as aboveGOOGLE_API_KEY: AI Studio key for Genkit pluginNEXT_PUBLIC_GEMINI_API_KEY: optional (frontend-visible)NEXT_PUBLIC_TRAINING_API_BASE: HTTPS base of training backend
Backend (Flask):
GOOGLE_API_KEY: Google API key (used for Custom Search)GOOGLE_CSE_ID: Custom Search Engine IDGEMINI_API_KEY: Gemini key used by backend where applicableCUDA_VISIBLE_DEVICES,PYTORCH_*: recommended GPU knobs
Next API server (server-side):
TRAINING_API_BASE: HTTPS base of training backend (used by Next API routes)
Local development
- Frontend dev:
npm i
npm run dev
# Set envs: NEXT_PUBLIC_GOLEM_SERVER_URL=http://localhost:5000
# NEXT_PUBLIC_BACKEND_URL=http://localhost:5000
# NEXT_PUBLIC_TRAINING_API_BASE=http://localhost:9006
- Backend dev:
source .venv/bin/activate # if created
python home/chezy/golem_flask_server.py # or use gunicorn
- Training dev:
# Run your FastAPI/Flask training service on port 9006
export TRAINING_API_BASE=http://localhost:9006
Point NEXT_PUBLIC_GOLEM_SERVER_URL and NEXT_PUBLIC_BACKEND_URL to http://localhost:5000 during local dev.
Common issues & fixes
502/CORS or fetch failures from frontend:
- Ensure backend is reachable at the configured public URL and is HTTPS.
- Backend uses permissive CORS via
CORS(app). If you restrict origins, include your frontend domain.
Missing search keys:
- Backend search requires
GOOGLE_API_KEYandGOOGLE_CSE_ID; without them, search-related features will error.
- Backend search requires
api/gemini proxy:
src/app/api/gemini/route.tsproxies to${GOLEM}/api/gemini. If your backend does not expose/api/gemini, either add it server-side or route Gemini calls via Genkit flows only.
Training routes:
- All
localhost:9006usages have been removed. UpdateTRAINING_API_BASE/NEXT_PUBLIC_TRAINING_API_BASEto your deployed training API base to avoid mock responses.
- All
Python deps on GPU image:
- If
xformersfails, proceed without it or install a CUDA-matching build.
- If
Gunicorn tuning:
- Adjust
--workersbased on GPU/CPU and model size; keep timeouts β€ 60s for proxies.
- Adjust
Redeploy steps
- Frontend (Vercel): push changes to your main branch; Vercel auto-builds.
- Backend (RunPod): pull latest, restart Gunicorn process.
URLs
- Frontend: your Vercel/Netlify site (e.g.,
https://<project>.vercel.app) - Backend: RunPod public proxy or tunnel (e.g.,
https://<pod-id>-5000.proxy.runpod.net) - Training backend: your public service (e.g.,
https://<training-host>)
Repository layout highlights
- Backend server:
home/chezy/golem_flask_server.py - Frontend API bridges:
src/app/api/* - Genkit flows:
src/ai/flows/*, init insrc/ai/genkit.ts - Frontend pages:
src/app/* - Containers:
Dockerfile.frontend,Dockerfile.golem - Docs:
setup-tunnel.md,docs/deployment.md
This guide aligns with the repository and removes hardcoded training endpoints in favor of environment variables, ensuring production-safe, non-mock behavior when a real training backend is provided.