golem-flask-backend / docs /deployment.md
mememechez's picture
Deploy final cleaned source code
ca28016

QWEN2Golem – Deployment Guide

Architecture

  • Frontend: Next.js 15 (App Router, TypeScript). Uses server actions and API routes under src/app/api/*. Genkit flows live in src/ai/flows/* and initialize via src/ai/genkit.ts (Google AI plugin).
  • Backend: Flask app home/chezy/golem_flask_server.py on port 5000. Provides chat /generate, state /consciousness-state, ASR /asr/transcribe, TTS /tts/synthesize, health /health, and others.
  • Model/AI:
    • Frontend Genkit flows target googleai/gemini-2.0-flash.
    • Backend orchestrates generation (Gemini via key if present) and optional Google Custom Search.
  • API Bridge: Next API routes proxy to the Flask backend:
    • src/app/api/generate/route.ts β†’ POST {GOLEM}/generate
    • src/app/api/consciousness-state/route.ts β†’ GET {GOLEM}/consciousness-state
    • src/app/api/asr/transcribe/route.ts β†’ POST {BACKEND}/asr/transcribe
    • src/app/api/tts/synthesize/route.ts β†’ POST {BACKEND}/tts/synthesize
  • Training service (separate backend): Exposed via TRAINING_API_BASE (FastAPI/Flask), replacing any localhost usage.
  • Containers: Dockerfile.frontend (Node 20 runtime, port 9002) and Dockerfile.golem (Python 3.11, port 5000).

Prerequisites

  • Frontend host: Vercel (recommended; vercel.json present) or Netlify (works; configure build/env).
  • GPU host for backend: RunPod (or any CUDA-enabled VM). Use an image with CUDA (e.g., runpod/pytorch:* with CUDA 12.x).
  • Keys:
    • Google AI Studio API key for Genkit (frontend server-side flows).
    • Google Custom Search: GOOGLE_API_KEY and GOOGLE_CSE_ID for backend search.
    • Optional Gemini on backend: GEMINI_API_KEY (backend also checks NEXT_PUBLIC_GEMINI_API_KEY).
  • Domain or tunnel: Public HTTPS base URL for backend (RunPod proxy, Cloudflare Tunnel, or ngrok). See setup-tunnel.md.

Backend (GPU) Setup on RunPod

  1. Create a GPU pod (CUDA 12.x). Expose port 5000 publicly.
  2. SSH/shell into the pod, clone repo, create venv, install deps:
sudo apt-get update -y && sudo apt-get install -y python3-venv git
git clone https://your.repo/QWEN2Golem.git && cd QWEN2Golem
python3 -m venv .venv && source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
  1. Set environment variables:
export GOOGLE_API_KEY=YOUR_GOOGLE_AI_STUDIO_KEY
export GOOGLE_CSE_ID=YOUR_GOOGLE_CSE_ID
export GEMINI_API_KEY=OPTIONAL_BACKEND_GEMINI_KEY
export CUDA_VISIBLE_DEVICES=0
  1. Start the server (Gunicorn recommended):
gunicorn -b 0.0.0.0:5000 --workers 1 --timeout 60 home.chezy.golem_flask_server:app
  1. Verify:
curl -sS http://localhost:5000/health
  1. Key endpoints exposed by the backend:
    • POST /generate, POST /generate/stream
    • GET /consciousness-state, POST /set-consciousness-dimension
    • POST /asr/transcribe, POST /tts/synthesize
    • GET /health, GET /status

If hosting behind a tunnel (Cloudflare/ngrok), the same steps apply; use the tunnel URL for the frontend.

Training backend (required to avoid mock responses)

You must host a real training service at a public base URL and configure the frontend to use it. The following endpoints are expected on the training service (FastAPI/Flask suggested):

  • POST /api/training/start-auto-training
  • GET /api/training/status/{training_job_id}
  • POST /api/ai-architect/generate-architecture
  • GET/POST /api/ai-architect/* (workflow, questionnaire, continue, notebook-html, artifacts, deployer, etc.)

Set the base URL via environment variables (see below). All hardcoded localhost usages have been removed in favor of env-configured URLs.

Frontend Setup (Vercel recommended)

  1. Connect repo to Vercel.
  2. Environment variables (Project Settings β†’ Environment Variables):
NEXT_PUBLIC_GOLEM_SERVER_URL = https://<public-backend-base>   # Flask backend (port 5000)
NEXT_PUBLIC_BACKEND_URL      = https://<public-backend-base>   # Same as above
GOOGLE_API_KEY               = <google-ai-studio-key>          # Genkit plugin
NEXT_PUBLIC_GEMINI_API_KEY   = <optional-display-or-shared-key>
TRAINING_API_BASE            = https://<public-training-base>  # Server-side Next API routes
NEXT_PUBLIC_TRAINING_API_BASE= https://<public-training-base>  # Client-side components
  1. Build and deploy (Vercel picks npm run build per vercel.json). Long-running function settings already exist in vercel.json.

  2. Test the app once deployed.

Netlify alternative

  • Build command: npm run build
  • Publish directory: .next
  • Same environment variables as above (Site settings β†’ Environment variables).
  • Ensure Next serverless/edge functions are enabled per Netlify plan. If using Netlify, you may choose to run the frontend as a static export plus purely client-side calls to the backend endpoints.

Genkit flows

Genkit is initialized in src/ai/genkit.ts with @genkit-ai/googleai and default model googleai/gemini-2.0-flash.

  • Ensure the environment has a valid Google AI Studio key (e.g., GOOGLE_API_KEY) for Genkit server-side usage.
  • Flows include:
    • src/ai/flows/get-initial-zpe-analysis-flow.ts
    • src/ai/flows/golem-chat.ts (frontend calls backend /generate by default)

Environment variables (summary)

Frontend:

  • NEXT_PUBLIC_GOLEM_SERVER_URL: HTTPS base of Flask backend
  • NEXT_PUBLIC_BACKEND_URL: same as above
  • GOOGLE_API_KEY: AI Studio key for Genkit plugin
  • NEXT_PUBLIC_GEMINI_API_KEY: optional (frontend-visible)
  • NEXT_PUBLIC_TRAINING_API_BASE: HTTPS base of training backend

Backend (Flask):

  • GOOGLE_API_KEY: Google API key (used for Custom Search)
  • GOOGLE_CSE_ID: Custom Search Engine ID
  • GEMINI_API_KEY: Gemini key used by backend where applicable
  • CUDA_VISIBLE_DEVICES, PYTORCH_*: recommended GPU knobs

Next API server (server-side):

  • TRAINING_API_BASE: HTTPS base of training backend (used by Next API routes)

Local development

  • Frontend dev:
npm i
npm run dev
# Set envs: NEXT_PUBLIC_GOLEM_SERVER_URL=http://localhost:5000
#           NEXT_PUBLIC_BACKEND_URL=http://localhost:5000
#           NEXT_PUBLIC_TRAINING_API_BASE=http://localhost:9006
  • Backend dev:
source .venv/bin/activate  # if created
python home/chezy/golem_flask_server.py  # or use gunicorn
  • Training dev:
# Run your FastAPI/Flask training service on port 9006
export TRAINING_API_BASE=http://localhost:9006

Point NEXT_PUBLIC_GOLEM_SERVER_URL and NEXT_PUBLIC_BACKEND_URL to http://localhost:5000 during local dev.

Common issues & fixes

  • 502/CORS or fetch failures from frontend:

    • Ensure backend is reachable at the configured public URL and is HTTPS.
    • Backend uses permissive CORS via CORS(app). If you restrict origins, include your frontend domain.
  • Missing search keys:

    • Backend search requires GOOGLE_API_KEY and GOOGLE_CSE_ID; without them, search-related features will error.
  • api/gemini proxy:

    • src/app/api/gemini/route.ts proxies to ${GOLEM}/api/gemini. If your backend does not expose /api/gemini, either add it server-side or route Gemini calls via Genkit flows only.
  • Training routes:

    • All localhost:9006 usages have been removed. Update TRAINING_API_BASE/NEXT_PUBLIC_TRAINING_API_BASE to your deployed training API base to avoid mock responses.
  • Python deps on GPU image:

    • If xformers fails, proceed without it or install a CUDA-matching build.
  • Gunicorn tuning:

    • Adjust --workers based on GPU/CPU and model size; keep timeouts ≀ 60s for proxies.

Redeploy steps

  • Frontend (Vercel): push changes to your main branch; Vercel auto-builds.
  • Backend (RunPod): pull latest, restart Gunicorn process.

URLs

  • Frontend: your Vercel/Netlify site (e.g., https://<project>.vercel.app)
  • Backend: RunPod public proxy or tunnel (e.g., https://<pod-id>-5000.proxy.runpod.net)
  • Training backend: your public service (e.g., https://<training-host>)

Repository layout highlights

  • Backend server: home/chezy/golem_flask_server.py
  • Frontend API bridges: src/app/api/*
  • Genkit flows: src/ai/flows/*, init in src/ai/genkit.ts
  • Frontend pages: src/app/*
  • Containers: Dockerfile.frontend, Dockerfile.golem
  • Docs: setup-tunnel.md, docs/deployment.md

This guide aligns with the repository and removes hardcoded training endpoints in favor of environment variables, ensuring production-safe, non-mock behavior when a real training backend is provided.