Spaces:
Runtime error
Runtime error
File size: 8,574 Bytes
ca28016 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 |
## QWEN2Golem β Deployment Guide
### Architecture
- **Frontend**: Next.js 15 (App Router, TypeScript). Uses server actions and API routes under `src/app/api/*`. Genkit flows live in `src/ai/flows/*` and initialize via `src/ai/genkit.ts` (Google AI plugin).
- **Backend**: Flask app `home/chezy/golem_flask_server.py` on port 5000. Provides chat `/generate`, state `/consciousness-state`, ASR `/asr/transcribe`, TTS `/tts/synthesize`, health `/health`, and others.
- **Model/AI**:
- Frontend Genkit flows target `googleai/gemini-2.0-flash`.
- Backend orchestrates generation (Gemini via key if present) and optional Google Custom Search.
- **API Bridge**: Next API routes proxy to the Flask backend:
- `src/app/api/generate/route.ts` β POST `{GOLEM}/generate`
- `src/app/api/consciousness-state/route.ts` β GET `{GOLEM}/consciousness-state`
- `src/app/api/asr/transcribe/route.ts` β POST `{BACKEND}/asr/transcribe`
- `src/app/api/tts/synthesize/route.ts` β POST `{BACKEND}/tts/synthesize`
- **Training service (separate backend)**: Exposed via `TRAINING_API_BASE` (FastAPI/Flask), replacing any localhost usage.
- **Containers**: `Dockerfile.frontend` (Node 20 runtime, port 9002) and `Dockerfile.golem` (Python 3.11, port 5000).
### Prerequisites
- **Frontend host**: Vercel (recommended; `vercel.json` present) or Netlify (works; configure build/env).
- **GPU host for backend**: RunPod (or any CUDA-enabled VM). Use an image with CUDA (e.g., `runpod/pytorch:*` with CUDA 12.x).
- **Keys**:
- Google AI Studio API key for Genkit (frontend server-side flows).
- Google Custom Search: `GOOGLE_API_KEY` and `GOOGLE_CSE_ID` for backend search.
- Optional Gemini on backend: `GEMINI_API_KEY` (backend also checks `NEXT_PUBLIC_GEMINI_API_KEY`).
- **Domain or tunnel**: Public HTTPS base URL for backend (RunPod proxy, Cloudflare Tunnel, or ngrok). See `setup-tunnel.md`.
### Backend (GPU) Setup on RunPod
1. Create a GPU pod (CUDA 12.x). Expose port 5000 publicly.
2. SSH/shell into the pod, clone repo, create venv, install deps:
```bash
sudo apt-get update -y && sudo apt-get install -y python3-venv git
git clone https://your.repo/QWEN2Golem.git && cd QWEN2Golem
python3 -m venv .venv && source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
```
3. Set environment variables:
```bash
export GOOGLE_API_KEY=YOUR_GOOGLE_AI_STUDIO_KEY
export GOOGLE_CSE_ID=YOUR_GOOGLE_CSE_ID
export GEMINI_API_KEY=OPTIONAL_BACKEND_GEMINI_KEY
export CUDA_VISIBLE_DEVICES=0
```
4. Start the server (Gunicorn recommended):
```bash
gunicorn -b 0.0.0.0:5000 --workers 1 --timeout 60 home.chezy.golem_flask_server:app
```
5. Verify:
```bash
curl -sS http://localhost:5000/health
```
6. Key endpoints exposed by the backend:
- POST `/generate`, POST `/generate/stream`
- GET `/consciousness-state`, POST `/set-consciousness-dimension`
- POST `/asr/transcribe`, POST `/tts/synthesize`
- GET `/health`, GET `/status`
If hosting behind a tunnel (Cloudflare/ngrok), the same steps apply; use the tunnel URL for the frontend.
### Training backend (required to avoid mock responses)
You must host a real training service at a public base URL and configure the frontend to use it. The following endpoints are expected on the training service (FastAPI/Flask suggested):
- POST `/api/training/start-auto-training`
- GET `/api/training/status/{training_job_id}`
- POST `/api/ai-architect/generate-architecture`
- GET/POST `/api/ai-architect/*` (workflow, questionnaire, continue, notebook-html, artifacts, deployer, etc.)
Set the base URL via environment variables (see below). All hardcoded localhost usages have been removed in favor of env-configured URLs.
### Frontend Setup (Vercel recommended)
1. Connect repo to Vercel.
2. Environment variables (Project Settings β Environment Variables):
```
NEXT_PUBLIC_GOLEM_SERVER_URL = https://<public-backend-base> # Flask backend (port 5000)
NEXT_PUBLIC_BACKEND_URL = https://<public-backend-base> # Same as above
GOOGLE_API_KEY = <google-ai-studio-key> # Genkit plugin
NEXT_PUBLIC_GEMINI_API_KEY = <optional-display-or-shared-key>
TRAINING_API_BASE = https://<public-training-base> # Server-side Next API routes
NEXT_PUBLIC_TRAINING_API_BASE= https://<public-training-base> # Client-side components
```
3. Build and deploy (Vercel picks `npm run build` per `vercel.json`). Long-running function settings already exist in `vercel.json`.
4. Test the app once deployed.
### Netlify alternative
- Build command: `npm run build`
- Publish directory: `.next`
- Same environment variables as above (Site settings β Environment variables).
- Ensure Next serverless/edge functions are enabled per Netlify plan. If using Netlify, you may choose to run the frontend as a static export plus purely client-side calls to the backend endpoints.
### Genkit flows
Genkit is initialized in `src/ai/genkit.ts` with `@genkit-ai/googleai` and default model `googleai/gemini-2.0-flash`.
- Ensure the environment has a valid Google AI Studio key (e.g., `GOOGLE_API_KEY`) for Genkit server-side usage.
- Flows include:
- `src/ai/flows/get-initial-zpe-analysis-flow.ts`
- `src/ai/flows/golem-chat.ts` (frontend calls backend `/generate` by default)
### Environment variables (summary)
Frontend:
- `NEXT_PUBLIC_GOLEM_SERVER_URL`: HTTPS base of Flask backend
- `NEXT_PUBLIC_BACKEND_URL`: same as above
- `GOOGLE_API_KEY`: AI Studio key for Genkit plugin
- `NEXT_PUBLIC_GEMINI_API_KEY`: optional (frontend-visible)
- `NEXT_PUBLIC_TRAINING_API_BASE`: HTTPS base of training backend
Backend (Flask):
- `GOOGLE_API_KEY`: Google API key (used for Custom Search)
- `GOOGLE_CSE_ID`: Custom Search Engine ID
- `GEMINI_API_KEY`: Gemini key used by backend where applicable
- `CUDA_VISIBLE_DEVICES`, `PYTORCH_*`: recommended GPU knobs
Next API server (server-side):
- `TRAINING_API_BASE`: HTTPS base of training backend (used by Next API routes)
### Local development
- Frontend dev:
```bash
npm i
npm run dev
# Set envs: NEXT_PUBLIC_GOLEM_SERVER_URL=http://localhost:5000
# NEXT_PUBLIC_BACKEND_URL=http://localhost:5000
# NEXT_PUBLIC_TRAINING_API_BASE=http://localhost:9006
```
- Backend dev:
```bash
source .venv/bin/activate # if created
python home/chezy/golem_flask_server.py # or use gunicorn
```
- Training dev:
```bash
# Run your FastAPI/Flask training service on port 9006
export TRAINING_API_BASE=http://localhost:9006
```
Point `NEXT_PUBLIC_GOLEM_SERVER_URL` and `NEXT_PUBLIC_BACKEND_URL` to `http://localhost:5000` during local dev.
### Common issues & fixes
- **502/CORS or fetch failures from frontend**:
- Ensure backend is reachable at the configured public URL and is HTTPS.
- Backend uses permissive CORS via `CORS(app)`. If you restrict origins, include your frontend domain.
- **Missing search keys**:
- Backend search requires `GOOGLE_API_KEY` and `GOOGLE_CSE_ID`; without them, search-related features will error.
- **api/gemini proxy**:
- `src/app/api/gemini/route.ts` proxies to `${GOLEM}/api/gemini`. If your backend does not expose `/api/gemini`, either add it server-side or route Gemini calls via Genkit flows only.
- **Training routes**:
- All `localhost:9006` usages have been removed. Update `TRAINING_API_BASE`/`NEXT_PUBLIC_TRAINING_API_BASE` to your deployed training API base to avoid mock responses.
- **Python deps on GPU image**:
- If `xformers` fails, proceed without it or install a CUDA-matching build.
- **Gunicorn tuning**:
- Adjust `--workers` based on GPU/CPU and model size; keep timeouts β€ 60s for proxies.
### Redeploy steps
- Frontend (Vercel): push changes to your main branch; Vercel auto-builds.
- Backend (RunPod): pull latest, restart Gunicorn process.
### URLs
- Frontend: your Vercel/Netlify site (e.g., `https://<project>.vercel.app`)
- Backend: RunPod public proxy or tunnel (e.g., `https://<pod-id>-5000.proxy.runpod.net`)
- Training backend: your public service (e.g., `https://<training-host>`)
### Repository layout highlights
- Backend server: `home/chezy/golem_flask_server.py`
- Frontend API bridges: `src/app/api/*`
- Genkit flows: `src/ai/flows/*`, init in `src/ai/genkit.ts`
- Frontend pages: `src/app/*`
- Containers: `Dockerfile.frontend`, `Dockerfile.golem`
- Docs: `setup-tunnel.md`, `docs/deployment.md`
This guide aligns with the repository and removes hardcoded training endpoints in favor of environment variables, ensuring production-safe, non-mock behavior when a real training backend is provided.
|