File size: 8,574 Bytes
ca28016
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
## QWEN2Golem – Deployment Guide

### Architecture

- **Frontend**: Next.js 15 (App Router, TypeScript). Uses server actions and API routes under `src/app/api/*`. Genkit flows live in `src/ai/flows/*` and initialize via `src/ai/genkit.ts` (Google AI plugin).
- **Backend**: Flask app `home/chezy/golem_flask_server.py` on port 5000. Provides chat `/generate`, state `/consciousness-state`, ASR `/asr/transcribe`, TTS `/tts/synthesize`, health `/health`, and others.
- **Model/AI**:
  - Frontend Genkit flows target `googleai/gemini-2.0-flash`.
  - Backend orchestrates generation (Gemini via key if present) and optional Google Custom Search.
- **API Bridge**: Next API routes proxy to the Flask backend:
  - `src/app/api/generate/route.ts` β†’ POST `{GOLEM}/generate`
  - `src/app/api/consciousness-state/route.ts` β†’ GET `{GOLEM}/consciousness-state`
  - `src/app/api/asr/transcribe/route.ts` β†’ POST `{BACKEND}/asr/transcribe`
  - `src/app/api/tts/synthesize/route.ts` β†’ POST `{BACKEND}/tts/synthesize`
- **Training service (separate backend)**: Exposed via `TRAINING_API_BASE` (FastAPI/Flask), replacing any localhost usage.
- **Containers**: `Dockerfile.frontend` (Node 20 runtime, port 9002) and `Dockerfile.golem` (Python 3.11, port 5000).

### Prerequisites

- **Frontend host**: Vercel (recommended; `vercel.json` present) or Netlify (works; configure build/env).
- **GPU host for backend**: RunPod (or any CUDA-enabled VM). Use an image with CUDA (e.g., `runpod/pytorch:*` with CUDA 12.x).
- **Keys**:
  - Google AI Studio API key for Genkit (frontend server-side flows).
  - Google Custom Search: `GOOGLE_API_KEY` and `GOOGLE_CSE_ID` for backend search.
  - Optional Gemini on backend: `GEMINI_API_KEY` (backend also checks `NEXT_PUBLIC_GEMINI_API_KEY`).
- **Domain or tunnel**: Public HTTPS base URL for backend (RunPod proxy, Cloudflare Tunnel, or ngrok). See `setup-tunnel.md`.

### Backend (GPU) Setup on RunPod

1. Create a GPU pod (CUDA 12.x). Expose port 5000 publicly.
2. SSH/shell into the pod, clone repo, create venv, install deps:

```bash
sudo apt-get update -y && sudo apt-get install -y python3-venv git
git clone https://your.repo/QWEN2Golem.git && cd QWEN2Golem
python3 -m venv .venv && source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
```

3. Set environment variables:

```bash
export GOOGLE_API_KEY=YOUR_GOOGLE_AI_STUDIO_KEY
export GOOGLE_CSE_ID=YOUR_GOOGLE_CSE_ID
export GEMINI_API_KEY=OPTIONAL_BACKEND_GEMINI_KEY
export CUDA_VISIBLE_DEVICES=0
```

4. Start the server (Gunicorn recommended):

```bash
gunicorn -b 0.0.0.0:5000 --workers 1 --timeout 60 home.chezy.golem_flask_server:app
```

5. Verify:

```bash
curl -sS http://localhost:5000/health
```

6. Key endpoints exposed by the backend:
   - POST `/generate`, POST `/generate/stream`
   - GET `/consciousness-state`, POST `/set-consciousness-dimension`
   - POST `/asr/transcribe`, POST `/tts/synthesize`
   - GET `/health`, GET `/status`

If hosting behind a tunnel (Cloudflare/ngrok), the same steps apply; use the tunnel URL for the frontend.

### Training backend (required to avoid mock responses)

You must host a real training service at a public base URL and configure the frontend to use it. The following endpoints are expected on the training service (FastAPI/Flask suggested):

- POST `/api/training/start-auto-training`
- GET `/api/training/status/{training_job_id}`
- POST `/api/ai-architect/generate-architecture`
- GET/POST `/api/ai-architect/*` (workflow, questionnaire, continue, notebook-html, artifacts, deployer, etc.)

Set the base URL via environment variables (see below). All hardcoded localhost usages have been removed in favor of env-configured URLs.

### Frontend Setup (Vercel recommended)

1. Connect repo to Vercel.
2. Environment variables (Project Settings β†’ Environment Variables):

```
NEXT_PUBLIC_GOLEM_SERVER_URL = https://<public-backend-base>   # Flask backend (port 5000)
NEXT_PUBLIC_BACKEND_URL      = https://<public-backend-base>   # Same as above
GOOGLE_API_KEY               = <google-ai-studio-key>          # Genkit plugin
NEXT_PUBLIC_GEMINI_API_KEY   = <optional-display-or-shared-key>
TRAINING_API_BASE            = https://<public-training-base>  # Server-side Next API routes
NEXT_PUBLIC_TRAINING_API_BASE= https://<public-training-base>  # Client-side components
```

3. Build and deploy (Vercel picks `npm run build` per `vercel.json`). Long-running function settings already exist in `vercel.json`.

4. Test the app once deployed.

### Netlify alternative

- Build command: `npm run build`
- Publish directory: `.next`
- Same environment variables as above (Site settings β†’ Environment variables).
- Ensure Next serverless/edge functions are enabled per Netlify plan. If using Netlify, you may choose to run the frontend as a static export plus purely client-side calls to the backend endpoints.

### Genkit flows

Genkit is initialized in `src/ai/genkit.ts` with `@genkit-ai/googleai` and default model `googleai/gemini-2.0-flash`.

- Ensure the environment has a valid Google AI Studio key (e.g., `GOOGLE_API_KEY`) for Genkit server-side usage.
- Flows include:
  - `src/ai/flows/get-initial-zpe-analysis-flow.ts`
  - `src/ai/flows/golem-chat.ts` (frontend calls backend `/generate` by default)

### Environment variables (summary)

Frontend:

- `NEXT_PUBLIC_GOLEM_SERVER_URL`: HTTPS base of Flask backend
- `NEXT_PUBLIC_BACKEND_URL`: same as above
- `GOOGLE_API_KEY`: AI Studio key for Genkit plugin
- `NEXT_PUBLIC_GEMINI_API_KEY`: optional (frontend-visible)
- `NEXT_PUBLIC_TRAINING_API_BASE`: HTTPS base of training backend

Backend (Flask):

- `GOOGLE_API_KEY`: Google API key (used for Custom Search)
- `GOOGLE_CSE_ID`: Custom Search Engine ID
- `GEMINI_API_KEY`: Gemini key used by backend where applicable
- `CUDA_VISIBLE_DEVICES`, `PYTORCH_*`: recommended GPU knobs

Next API server (server-side):

- `TRAINING_API_BASE`: HTTPS base of training backend (used by Next API routes)

### Local development

- Frontend dev:

```bash
npm i
npm run dev
# Set envs: NEXT_PUBLIC_GOLEM_SERVER_URL=http://localhost:5000
#           NEXT_PUBLIC_BACKEND_URL=http://localhost:5000
#           NEXT_PUBLIC_TRAINING_API_BASE=http://localhost:9006
```

- Backend dev:

```bash
source .venv/bin/activate  # if created
python home/chezy/golem_flask_server.py  # or use gunicorn
```

- Training dev:

```bash
# Run your FastAPI/Flask training service on port 9006
export TRAINING_API_BASE=http://localhost:9006
```

Point `NEXT_PUBLIC_GOLEM_SERVER_URL` and `NEXT_PUBLIC_BACKEND_URL` to `http://localhost:5000` during local dev.

### Common issues & fixes

- **502/CORS or fetch failures from frontend**:
  - Ensure backend is reachable at the configured public URL and is HTTPS.
  - Backend uses permissive CORS via `CORS(app)`. If you restrict origins, include your frontend domain.

- **Missing search keys**:
  - Backend search requires `GOOGLE_API_KEY` and `GOOGLE_CSE_ID`; without them, search-related features will error.

- **api/gemini proxy**:
  - `src/app/api/gemini/route.ts` proxies to `${GOLEM}/api/gemini`. If your backend does not expose `/api/gemini`, either add it server-side or route Gemini calls via Genkit flows only.

- **Training routes**:
  - All `localhost:9006` usages have been removed. Update `TRAINING_API_BASE`/`NEXT_PUBLIC_TRAINING_API_BASE` to your deployed training API base to avoid mock responses.

- **Python deps on GPU image**:
  - If `xformers` fails, proceed without it or install a CUDA-matching build.

- **Gunicorn tuning**:
  - Adjust `--workers` based on GPU/CPU and model size; keep timeouts ≀ 60s for proxies.

### Redeploy steps

- Frontend (Vercel): push changes to your main branch; Vercel auto-builds.
- Backend (RunPod): pull latest, restart Gunicorn process.

### URLs

- Frontend: your Vercel/Netlify site (e.g., `https://<project>.vercel.app`)
- Backend: RunPod public proxy or tunnel (e.g., `https://<pod-id>-5000.proxy.runpod.net`)
- Training backend: your public service (e.g., `https://<training-host>`)

### Repository layout highlights

- Backend server: `home/chezy/golem_flask_server.py`
- Frontend API bridges: `src/app/api/*`
- Genkit flows: `src/ai/flows/*`, init in `src/ai/genkit.ts`
- Frontend pages: `src/app/*`
- Containers: `Dockerfile.frontend`, `Dockerfile.golem`
- Docs: `setup-tunnel.md`, `docs/deployment.md`

This guide aligns with the repository and removes hardcoded training endpoints in favor of environment variables, ensuring production-safe, non-mock behavior when a real training backend is provided.