Spaces:

mememechez
/

golem-flask-backend

Runtime error

File size: 8,574 Bytes

ca28016

## QWEN2Golem – Deployment Guide

### Architecture

- **Frontend**: Next.js 15 (App Router, TypeScript). Uses server actions and API routes under `src/app/api/*`. Genkit flows live in `src/ai/flows/*` and initialize via `src/ai/genkit.ts` (Google AI plugin).
- **Backend**: Flask app `home/chezy/golem_flask_server.py` on port 5000. Provides chat `/generate`, state `/consciousness-state`, ASR `/asr/transcribe`, TTS `/tts/synthesize`, health `/health`, and others.
- **Model/AI**:
  - Frontend Genkit flows target `googleai/gemini-2.0-flash`.
  - Backend orchestrates generation (Gemini via key if present) and optional Google Custom Search.
- **API Bridge**: Next API routes proxy to the Flask backend:
  - `src/app/api/generate/route.ts` → POST `{GOLEM}/generate`
  - `src/app/api/consciousness-state/route.ts` → GET `{GOLEM}/consciousness-state`
  - `src/app/api/asr/transcribe/route.ts` → POST `{BACKEND}/asr/transcribe`
  - `src/app/api/tts/synthesize/route.ts` → POST `{BACKEND}/tts/synthesize`
- **Training service (separate backend)**: Exposed via `TRAINING_API_BASE` (FastAPI/Flask), replacing any localhost usage.
- **Containers**: `Dockerfile.frontend` (Node 20 runtime, port 9002) and `Dockerfile.golem` (Python 3.11, port 5000).

### Prerequisites

- **Frontend host**: Vercel (recommended; `vercel.json` present) or Netlify (works; configure build/env).
- **GPU host for backend**: RunPod (or any CUDA-enabled VM). Use an image with CUDA (e.g., `runpod/pytorch:*` with CUDA 12.x).
- **Keys**:
  - Google AI Studio API key for Genkit (frontend server-side flows).
  - Google Custom Search: `GOOGLE_API_KEY` and `GOOGLE_CSE_ID` for backend search.
  - Optional Gemini on backend: `GEMINI_API_KEY` (backend also checks `NEXT_PUBLIC_GEMINI_API_KEY`).
- **Domain or tunnel**: Public HTTPS base URL for backend (RunPod proxy, Cloudflare Tunnel, or ngrok). See `setup-tunnel.md`.

### Backend (GPU) Setup on RunPod

1. Create a GPU pod (CUDA 12.x). Expose port 5000 publicly.
2. SSH/shell into the pod, clone repo, create venv, install deps:

```bash
sudo apt-get update -y && sudo apt-get install -y python3-venv git
git clone https://your.repo/QWEN2Golem.git && cd QWEN2Golem
python3 -m venv .venv && source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
```

3. Set environment variables:

```bash
export GOOGLE_API_KEY=YOUR_GOOGLE_AI_STUDIO_KEY
export GOOGLE_CSE_ID=YOUR_GOOGLE_CSE_ID
export GEMINI_API_KEY=OPTIONAL_BACKEND_GEMINI_KEY
export CUDA_VISIBLE_DEVICES=0
```

4. Start the server (Gunicorn recommended):

```bash
gunicorn -b 0.0.0.0:5000 --workers 1 --timeout 60 home.chezy.golem_flask_server:app
```

5. Verify:

```bash
curl -sS http://localhost:5000/health
```

6. Key endpoints exposed by the backend:
   - POST `/generate`, POST `/generate/stream`
   - GET `/consciousness-state`, POST `/set-consciousness-dimension`
   - POST `/asr/transcribe`, POST `/tts/synthesize`
   - GET `/health`, GET `/status`

If hosting behind a tunnel (Cloudflare/ngrok), the same steps apply; use the tunnel URL for the frontend.

### Training backend (required to avoid mock responses)

You must host a real training service at a public base URL and configure the frontend to use it. The following endpoints are expected on the training service (FastAPI/Flask suggested):

- POST `/api/training/start-auto-training`
- GET `/api/training/status/{training_job_id}`
- POST `/api/ai-architect/generate-architecture`
- GET/POST `/api/ai-architect/*` (workflow, questionnaire, continue, notebook-html, artifacts, deployer, etc.)

Set the base URL via environment variables (see below). All hardcoded localhost usages have been removed in favor of env-configured URLs.

### Frontend Setup (Vercel recommended)

1. Connect repo to Vercel.
2. Environment variables (Project Settings → Environment Variables):

```
NEXT_PUBLIC_GOLEM_SERVER_URL = https://<public-backend-base>   # Flask backend (port 5000)
NEXT_PUBLIC_BACKEND_URL      = https://<public-backend-base>   # Same as above
GOOGLE_API_KEY               = <google-ai-studio-key>          # Genkit plugin
NEXT_PUBLIC_GEMINI_API_KEY   = <optional-display-or-shared-key>
TRAINING_API_BASE            = https://<public-training-base>  # Server-side Next API routes
NEXT_PUBLIC_TRAINING_API_BASE= https://<public-training-base>  # Client-side components
```

3. Build and deploy (Vercel picks `npm run build` per `vercel.json`). Long-running function settings already exist in `vercel.json`.

4. Test the app once deployed.

### Netlify alternative

- Build command: `npm run build`
- Publish directory: `.next`
- Same environment variables as above (Site settings → Environment variables).
- Ensure Next serverless/edge functions are enabled per Netlify plan. If using Netlify, you may choose to run the frontend as a static export plus purely client-side calls to the backend endpoints.

### Genkit flows

Genkit is initialized in `src/ai/genkit.ts` with `@genkit-ai/googleai` and default model `googleai/gemini-2.0-flash`.

- Ensure the environment has a valid Google AI Studio key (e.g., `GOOGLE_API_KEY`) for Genkit server-side usage.
- Flows include:
  - `src/ai/flows/get-initial-zpe-analysis-flow.ts`
  - `src/ai/flows/golem-chat.ts` (frontend calls backend `/generate` by default)

### Environment variables (summary)

Frontend:

- `NEXT_PUBLIC_GOLEM_SERVER_URL`: HTTPS base of Flask backend
- `NEXT_PUBLIC_BACKEND_URL`: same as above
- `GOOGLE_API_KEY`: AI Studio key for Genkit plugin
- `NEXT_PUBLIC_GEMINI_API_KEY`: optional (frontend-visible)
- `NEXT_PUBLIC_TRAINING_API_BASE`: HTTPS base of training backend

Backend (Flask):

- `GOOGLE_API_KEY`: Google API key (used for Custom Search)
- `GOOGLE_CSE_ID`: Custom Search Engine ID
- `GEMINI_API_KEY`: Gemini key used by backend where applicable
- `CUDA_VISIBLE_DEVICES`, `PYTORCH_*`: recommended GPU knobs

Next API server (server-side):

- `TRAINING_API_BASE`: HTTPS base of training backend (used by Next API routes)

### Local development

- Frontend dev:

```bash
npm i
npm run dev
# Set envs: NEXT_PUBLIC_GOLEM_SERVER_URL=http://localhost:5000
#           NEXT_PUBLIC_BACKEND_URL=http://localhost:5000
#           NEXT_PUBLIC_TRAINING_API_BASE=http://localhost:9006
```

- Backend dev:

```bash
source .venv/bin/activate  # if created
python home/chezy/golem_flask_server.py  # or use gunicorn
```

- Training dev:

```bash
# Run your FastAPI/Flask training service on port 9006
export TRAINING_API_BASE=http://localhost:9006
```

Point `NEXT_PUBLIC_GOLEM_SERVER_URL` and `NEXT_PUBLIC_BACKEND_URL` to `http://localhost:5000` during local dev.

### Common issues & fixes

- **502/CORS or fetch failures from frontend**:
  - Ensure backend is reachable at the configured public URL and is HTTPS.
  - Backend uses permissive CORS via `CORS(app)`. If you restrict origins, include your frontend domain.

- **Missing search keys**:
  - Backend search requires `GOOGLE_API_KEY` and `GOOGLE_CSE_ID`; without them, search-related features will error.

- **api/gemini proxy**:
  - `src/app/api/gemini/route.ts` proxies to `${GOLEM}/api/gemini`. If your backend does not expose `/api/gemini`, either add it server-side or route Gemini calls via Genkit flows only.

- **Training routes**:
  - All `localhost:9006` usages have been removed. Update `TRAINING_API_BASE`/`NEXT_PUBLIC_TRAINING_API_BASE` to your deployed training API base to avoid mock responses.

- **Python deps on GPU image**:
  - If `xformers` fails, proceed without it or install a CUDA-matching build.

- **Gunicorn tuning**:
  - Adjust `--workers` based on GPU/CPU and model size; keep timeouts ≤ 60s for proxies.

### Redeploy steps

- Frontend (Vercel): push changes to your main branch; Vercel auto-builds.
- Backend (RunPod): pull latest, restart Gunicorn process.

### URLs

- Frontend: your Vercel/Netlify site (e.g., `https://<project>.vercel.app`)
- Backend: RunPod public proxy or tunnel (e.g., `https://<pod-id>-5000.proxy.runpod.net`)
- Training backend: your public service (e.g., `https://<training-host>`)

### Repository layout highlights

- Backend server: `home/chezy/golem_flask_server.py`
- Frontend API bridges: `src/app/api/*`
- Genkit flows: `src/ai/flows/*`, init in `src/ai/genkit.ts`
- Frontend pages: `src/app/*`
- Containers: `Dockerfile.frontend`, `Dockerfile.golem`
- Docs: `setup-tunnel.md`, `docs/deployment.md`

This guide aligns with the repository and removes hardcoded training endpoints in favor of environment variables, ensuring production-safe, non-mock behavior when a real training backend is provided.