## QWEN2Golem – Deployment Guide ### Architecture - **Frontend**: Next.js 15 (App Router, TypeScript). Uses server actions and API routes under `src/app/api/*`. Genkit flows live in `src/ai/flows/*` and initialize via `src/ai/genkit.ts` (Google AI plugin). - **Backend**: Flask app `home/chezy/golem_flask_server.py` on port 5000. Provides chat `/generate`, state `/consciousness-state`, ASR `/asr/transcribe`, TTS `/tts/synthesize`, health `/health`, and others. - **Model/AI**: - Frontend Genkit flows target `googleai/gemini-2.0-flash`. - Backend orchestrates generation (Gemini via key if present) and optional Google Custom Search. - **API Bridge**: Next API routes proxy to the Flask backend: - `src/app/api/generate/route.ts` → POST `{GOLEM}/generate` - `src/app/api/consciousness-state/route.ts` → GET `{GOLEM}/consciousness-state` - `src/app/api/asr/transcribe/route.ts` → POST `{BACKEND}/asr/transcribe` - `src/app/api/tts/synthesize/route.ts` → POST `{BACKEND}/tts/synthesize` - **Training service (separate backend)**: Exposed via `TRAINING_API_BASE` (FastAPI/Flask), replacing any localhost usage. - **Containers**: `Dockerfile.frontend` (Node 20 runtime, port 9002) and `Dockerfile.golem` (Python 3.11, port 5000). ### Prerequisites - **Frontend host**: Vercel (recommended; `vercel.json` present) or Netlify (works; configure build/env). - **GPU host for backend**: RunPod (or any CUDA-enabled VM). Use an image with CUDA (e.g., `runpod/pytorch:*` with CUDA 12.x). - **Keys**: - Google AI Studio API key for Genkit (frontend server-side flows). - Google Custom Search: `GOOGLE_API_KEY` and `GOOGLE_CSE_ID` for backend search. - Optional Gemini on backend: `GEMINI_API_KEY` (backend also checks `NEXT_PUBLIC_GEMINI_API_KEY`). - **Domain or tunnel**: Public HTTPS base URL for backend (RunPod proxy, Cloudflare Tunnel, or ngrok). See `setup-tunnel.md`. ### Backend (GPU) Setup on RunPod 1. Create a GPU pod (CUDA 12.x). Expose port 5000 publicly. 2. SSH/shell into the pod, clone repo, create venv, install deps: ```bash sudo apt-get update -y && sudo apt-get install -y python3-venv git git clone https://your.repo/QWEN2Golem.git && cd QWEN2Golem python3 -m venv .venv && source .venv/bin/activate pip install --upgrade pip pip install -r requirements.txt ``` 3. Set environment variables: ```bash export GOOGLE_API_KEY=YOUR_GOOGLE_AI_STUDIO_KEY export GOOGLE_CSE_ID=YOUR_GOOGLE_CSE_ID export GEMINI_API_KEY=OPTIONAL_BACKEND_GEMINI_KEY export CUDA_VISIBLE_DEVICES=0 ``` 4. Start the server (Gunicorn recommended): ```bash gunicorn -b 0.0.0.0:5000 --workers 1 --timeout 60 home.chezy.golem_flask_server:app ``` 5. Verify: ```bash curl -sS http://localhost:5000/health ``` 6. Key endpoints exposed by the backend: - POST `/generate`, POST `/generate/stream` - GET `/consciousness-state`, POST `/set-consciousness-dimension` - POST `/asr/transcribe`, POST `/tts/synthesize` - GET `/health`, GET `/status` If hosting behind a tunnel (Cloudflare/ngrok), the same steps apply; use the tunnel URL for the frontend. ### Training backend (required to avoid mock responses) You must host a real training service at a public base URL and configure the frontend to use it. The following endpoints are expected on the training service (FastAPI/Flask suggested): - POST `/api/training/start-auto-training` - GET `/api/training/status/{training_job_id}` - POST `/api/ai-architect/generate-architecture` - GET/POST `/api/ai-architect/*` (workflow, questionnaire, continue, notebook-html, artifacts, deployer, etc.) Set the base URL via environment variables (see below). All hardcoded localhost usages have been removed in favor of env-configured URLs. ### Frontend Setup (Vercel recommended) 1. Connect repo to Vercel. 2. Environment variables (Project Settings → Environment Variables): ``` NEXT_PUBLIC_GOLEM_SERVER_URL = https:// # Flask backend (port 5000) NEXT_PUBLIC_BACKEND_URL = https:// # Same as above GOOGLE_API_KEY = # Genkit plugin NEXT_PUBLIC_GEMINI_API_KEY = TRAINING_API_BASE = https:// # Server-side Next API routes NEXT_PUBLIC_TRAINING_API_BASE= https:// # Client-side components ``` 3. Build and deploy (Vercel picks `npm run build` per `vercel.json`). Long-running function settings already exist in `vercel.json`. 4. Test the app once deployed. ### Netlify alternative - Build command: `npm run build` - Publish directory: `.next` - Same environment variables as above (Site settings → Environment variables). - Ensure Next serverless/edge functions are enabled per Netlify plan. If using Netlify, you may choose to run the frontend as a static export plus purely client-side calls to the backend endpoints. ### Genkit flows Genkit is initialized in `src/ai/genkit.ts` with `@genkit-ai/googleai` and default model `googleai/gemini-2.0-flash`. - Ensure the environment has a valid Google AI Studio key (e.g., `GOOGLE_API_KEY`) for Genkit server-side usage. - Flows include: - `src/ai/flows/get-initial-zpe-analysis-flow.ts` - `src/ai/flows/golem-chat.ts` (frontend calls backend `/generate` by default) ### Environment variables (summary) Frontend: - `NEXT_PUBLIC_GOLEM_SERVER_URL`: HTTPS base of Flask backend - `NEXT_PUBLIC_BACKEND_URL`: same as above - `GOOGLE_API_KEY`: AI Studio key for Genkit plugin - `NEXT_PUBLIC_GEMINI_API_KEY`: optional (frontend-visible) - `NEXT_PUBLIC_TRAINING_API_BASE`: HTTPS base of training backend Backend (Flask): - `GOOGLE_API_KEY`: Google API key (used for Custom Search) - `GOOGLE_CSE_ID`: Custom Search Engine ID - `GEMINI_API_KEY`: Gemini key used by backend where applicable - `CUDA_VISIBLE_DEVICES`, `PYTORCH_*`: recommended GPU knobs Next API server (server-side): - `TRAINING_API_BASE`: HTTPS base of training backend (used by Next API routes) ### Local development - Frontend dev: ```bash npm i npm run dev # Set envs: NEXT_PUBLIC_GOLEM_SERVER_URL=http://localhost:5000 # NEXT_PUBLIC_BACKEND_URL=http://localhost:5000 # NEXT_PUBLIC_TRAINING_API_BASE=http://localhost:9006 ``` - Backend dev: ```bash source .venv/bin/activate # if created python home/chezy/golem_flask_server.py # or use gunicorn ``` - Training dev: ```bash # Run your FastAPI/Flask training service on port 9006 export TRAINING_API_BASE=http://localhost:9006 ``` Point `NEXT_PUBLIC_GOLEM_SERVER_URL` and `NEXT_PUBLIC_BACKEND_URL` to `http://localhost:5000` during local dev. ### Common issues & fixes - **502/CORS or fetch failures from frontend**: - Ensure backend is reachable at the configured public URL and is HTTPS. - Backend uses permissive CORS via `CORS(app)`. If you restrict origins, include your frontend domain. - **Missing search keys**: - Backend search requires `GOOGLE_API_KEY` and `GOOGLE_CSE_ID`; without them, search-related features will error. - **api/gemini proxy**: - `src/app/api/gemini/route.ts` proxies to `${GOLEM}/api/gemini`. If your backend does not expose `/api/gemini`, either add it server-side or route Gemini calls via Genkit flows only. - **Training routes**: - All `localhost:9006` usages have been removed. Update `TRAINING_API_BASE`/`NEXT_PUBLIC_TRAINING_API_BASE` to your deployed training API base to avoid mock responses. - **Python deps on GPU image**: - If `xformers` fails, proceed without it or install a CUDA-matching build. - **Gunicorn tuning**: - Adjust `--workers` based on GPU/CPU and model size; keep timeouts ≤ 60s for proxies. ### Redeploy steps - Frontend (Vercel): push changes to your main branch; Vercel auto-builds. - Backend (RunPod): pull latest, restart Gunicorn process. ### URLs - Frontend: your Vercel/Netlify site (e.g., `https://.vercel.app`) - Backend: RunPod public proxy or tunnel (e.g., `https://-5000.proxy.runpod.net`) - Training backend: your public service (e.g., `https://`) ### Repository layout highlights - Backend server: `home/chezy/golem_flask_server.py` - Frontend API bridges: `src/app/api/*` - Genkit flows: `src/ai/flows/*`, init in `src/ai/genkit.ts` - Frontend pages: `src/app/*` - Containers: `Dockerfile.frontend`, `Dockerfile.golem` - Docs: `setup-tunnel.md`, `docs/deployment.md` This guide aligns with the repository and removes hardcoded training endpoints in favor of environment variables, ensuring production-safe, non-mock behavior when a real training backend is provided.