Spaces:

mememechez
/

golem-flask-backend

Runtime error

App Files Files Community

golem-flask-backend / docs /deployment.md

mememechez

Deploy final cleaned source code

ca28016 3 months ago

preview code

raw

history blame contribute delete

8.57 kB

QWEN2Golem – Deployment Guide

Architecture

Frontend: Next.js 15 (App Router, TypeScript). Uses server actions and API routes under src/app/api/*. Genkit flows live in src/ai/flows/* and initialize via src/ai/genkit.ts (Google AI plugin).
Backend: Flask app home/chezy/golem_flask_server.py on port 5000. Provides chat /generate, state /consciousness-state, ASR /asr/transcribe, TTS /tts/synthesize, health /health, and others.
Model/AI:
- Frontend Genkit flows target googleai/gemini-2.0-flash.
- Backend orchestrates generation (Gemini via key if present) and optional Google Custom Search.
API Bridge: Next API routes proxy to the Flask backend:
- src/app/api/generate/route.ts → POST {GOLEM}/generate
- src/app/api/consciousness-state/route.ts → GET {GOLEM}/consciousness-state
- src/app/api/asr/transcribe/route.ts → POST {BACKEND}/asr/transcribe
- src/app/api/tts/synthesize/route.ts → POST {BACKEND}/tts/synthesize
Training service (separate backend): Exposed via TRAINING_API_BASE (FastAPI/Flask), replacing any localhost usage.
Containers: Dockerfile.frontend (Node 20 runtime, port 9002) and Dockerfile.golem (Python 3.11, port 5000).

Prerequisites

Frontend host: Vercel (recommended; vercel.json present) or Netlify (works; configure build/env).
GPU host for backend: RunPod (or any CUDA-enabled VM). Use an image with CUDA (e.g., runpod/pytorch:* with CUDA 12.x).
Keys:
- Google AI Studio API key for Genkit (frontend server-side flows).
- Google Custom Search: GOOGLE_API_KEY and GOOGLE_CSE_ID for backend search.
- Optional Gemini on backend: GEMINI_API_KEY (backend also checks NEXT_PUBLIC_GEMINI_API_KEY).
Domain or tunnel: Public HTTPS base URL for backend (RunPod proxy, Cloudflare Tunnel, or ngrok). See setup-tunnel.md.

Backend (GPU) Setup on RunPod

Create a GPU pod (CUDA 12.x). Expose port 5000 publicly.
SSH/shell into the pod, clone repo, create venv, install deps:

sudo apt-get update -y && sudo apt-get install -y python3-venv git
git clone https://your.repo/QWEN2Golem.git && cd QWEN2Golem
python3 -m venv .venv && source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

Set environment variables:

export GOOGLE_API_KEY=YOUR_GOOGLE_AI_STUDIO_KEY
export GOOGLE_CSE_ID=YOUR_GOOGLE_CSE_ID
export GEMINI_API_KEY=OPTIONAL_BACKEND_GEMINI_KEY
export CUDA_VISIBLE_DEVICES=0

Start the server (Gunicorn recommended):

gunicorn -b 0.0.0.0:5000 --workers 1 --timeout 60 home.chezy.golem_flask_server:app

Verify:

curl -sS http://localhost:5000/health

Key endpoints exposed by the backend:
- POST /generate, POST /generate/stream
- GET /consciousness-state, POST /set-consciousness-dimension
- POST /asr/transcribe, POST /tts/synthesize
- GET /health, GET /status

If hosting behind a tunnel (Cloudflare/ngrok), the same steps apply; use the tunnel URL for the frontend.

Training backend (required to avoid mock responses)

You must host a real training service at a public base URL and configure the frontend to use it. The following endpoints are expected on the training service (FastAPI/Flask suggested):

POST /api/training/start-auto-training
GET /api/training/status/{training_job_id}
POST /api/ai-architect/generate-architecture
GET/POST /api/ai-architect/* (workflow, questionnaire, continue, notebook-html, artifacts, deployer, etc.)

Set the base URL via environment variables (see below). All hardcoded localhost usages have been removed in favor of env-configured URLs.

Frontend Setup (Vercel recommended)

Connect repo to Vercel.
Environment variables (Project Settings → Environment Variables):

NEXT_PUBLIC_GOLEM_SERVER_URL = https://<public-backend-base>   # Flask backend (port 5000)
NEXT_PUBLIC_BACKEND_URL      = https://<public-backend-base>   # Same as above
GOOGLE_API_KEY               = <google-ai-studio-key>          # Genkit plugin
NEXT_PUBLIC_GEMINI_API_KEY   = <optional-display-or-shared-key>
TRAINING_API_BASE            = https://<public-training-base>  # Server-side Next API routes
NEXT_PUBLIC_TRAINING_API_BASE= https://<public-training-base>  # Client-side components

Build and deploy (Vercel picks npm run build per vercel.json). Long-running function settings already exist in vercel.json.
Test the app once deployed.

Netlify alternative

Build command: npm run build
Publish directory: .next
Same environment variables as above (Site settings → Environment variables).
Ensure Next serverless/edge functions are enabled per Netlify plan. If using Netlify, you may choose to run the frontend as a static export plus purely client-side calls to the backend endpoints.

Genkit flows

Genkit is initialized in src/ai/genkit.ts with @genkit-ai/googleai and default model googleai/gemini-2.0-flash.

Ensure the environment has a valid Google AI Studio key (e.g., GOOGLE_API_KEY) for Genkit server-side usage.
Flows include:
- src/ai/flows/get-initial-zpe-analysis-flow.ts
- src/ai/flows/golem-chat.ts (frontend calls backend /generate by default)

Environment variables (summary)

Frontend:

NEXT_PUBLIC_GOLEM_SERVER_URL: HTTPS base of Flask backend
NEXT_PUBLIC_BACKEND_URL: same as above
GOOGLE_API_KEY: AI Studio key for Genkit plugin
NEXT_PUBLIC_GEMINI_API_KEY: optional (frontend-visible)
NEXT_PUBLIC_TRAINING_API_BASE: HTTPS base of training backend

Backend (Flask):

GOOGLE_API_KEY: Google API key (used for Custom Search)
GOOGLE_CSE_ID: Custom Search Engine ID
GEMINI_API_KEY: Gemini key used by backend where applicable
CUDA_VISIBLE_DEVICES, PYTORCH_*: recommended GPU knobs

Next API server (server-side):

TRAINING_API_BASE: HTTPS base of training backend (used by Next API routes)

Local development

Frontend dev:

npm i
npm run dev
# Set envs: NEXT_PUBLIC_GOLEM_SERVER_URL=http://localhost:5000
#           NEXT_PUBLIC_BACKEND_URL=http://localhost:5000
#           NEXT_PUBLIC_TRAINING_API_BASE=http://localhost:9006

Backend dev:

source .venv/bin/activate  # if created
python home/chezy/golem_flask_server.py  # or use gunicorn

Training dev:

# Run your FastAPI/Flask training service on port 9006
export TRAINING_API_BASE=http://localhost:9006

Point NEXT_PUBLIC_GOLEM_SERVER_URL and NEXT_PUBLIC_BACKEND_URL to http://localhost:5000 during local dev.

Common issues & fixes

502/CORS or fetch failures from frontend:
- Ensure backend is reachable at the configured public URL and is HTTPS.
- Backend uses permissive CORS via CORS(app). If you restrict origins, include your frontend domain.
Missing search keys:
- Backend search requires GOOGLE_API_KEY and GOOGLE_CSE_ID; without them, search-related features will error.
api/gemini proxy:
- src/app/api/gemini/route.ts proxies to ${GOLEM}/api/gemini. If your backend does not expose /api/gemini, either add it server-side or route Gemini calls via Genkit flows only.
Training routes:
- All localhost:9006 usages have been removed. Update TRAINING_API_BASE/NEXT_PUBLIC_TRAINING_API_BASE to your deployed training API base to avoid mock responses.
Python deps on GPU image:
- If xformers fails, proceed without it or install a CUDA-matching build.
Gunicorn tuning:
- Adjust --workers based on GPU/CPU and model size; keep timeouts ≤ 60s for proxies.

Redeploy steps

Frontend (Vercel): push changes to your main branch; Vercel auto-builds.
Backend (RunPod): pull latest, restart Gunicorn process.

URLs

Frontend: your Vercel/Netlify site (e.g., https://<project>.vercel.app)
Backend: RunPod public proxy or tunnel (e.g., https://<pod-id>-5000.proxy.runpod.net)
Training backend: your public service (e.g., https://<training-host>)

Repository layout highlights

Backend server: home/chezy/golem_flask_server.py
Frontend API bridges: src/app/api/*
Genkit flows: src/ai/flows/*, init in src/ai/genkit.ts
Frontend pages: src/app/*
Containers: Dockerfile.frontend, Dockerfile.golem
Docs: setup-tunnel.md, docs/deployment.md

This guide aligns with the repository and removes hardcoded training endpoints in favor of environment variables, ensuring production-safe, non-mock behavior when a real training backend is provided.