rohit
update
b7b8e60

RAG Pipeline API Documentation

Overview

FastAPI-based RAG (Retrieval-Augmented Generation) pipeline with OpenRouter GLM integration for intelligent tool calling.

Base URL

http://localhost:8000

Endpoints

/chat - Main Chat Endpoint

Method: POST
Description: Intelligent chat with RAG tool calling. GLM automatically determines when to use RAG vs. general conversation.

Request Body

{
  "messages": [
    {
      "role": "user|assistant|system",
      "content": "string"
    }
  ]
}

Response Format

{
  "response": "string",
  "tool_calls": [
    {
      "name": "rag_qa",
      "arguments": "{\"question\": \"string\", \"dataset\": \"string\"}"
    }
  ] | null
}

Examples

1. General Greeting (No RAG):

curl -X POST http://localhost:8000/chat \
  -H "Content-Type: application/json" \
  -d '{"messages":[{"role":"user","content":"hi"}]}'

Response:

{
  "response": "Hi! I'm Rohit's AI assistant. I can help you learn about his professional background, skills, and experience. What would you like to know about Rohit?",
  "tool_calls": null
}

2. Portfolio Question (RAG Enabled):

curl -X POST http://localhost:8000/chat \
  -H "Content-Type: application/json" \
  -d '{"messages":[{"role":"user","content":"What is your current role?"}]}'

Response:

{
  "response": "Based on the portfolio information, Rohit is currently working as a Tech Lead at FleetEnable, where he leads UI development for a logistics SaaS product focused on drayage and freight management...",
  "tool_calls": [
    {
      "name": "rag_qa", 
      "arguments": "{\"question\": \"What is your current role?\"}"
    }
  ]
}

/health - Health Check

Method: GET
Description: Check API and dataset loading status.

Response

{
  "status": "healthy",
  "datasets_loaded": 1,
  "available_datasets": ["developer-portfolio"]
}

/datasets - List Available Datasets

Method: GET
Description: Get list of available datasets.

Response

{
  "datasets": ["developer-portfolio"]
}

Features

🧠 Intelligent Tool Calling

  • Automatic Detection: GLM determines when questions need RAG vs. general conversation
  • Context-Aware: Uses portfolio information for relevant questions
  • Natural Responses: Synthesizes RAG results into conversational answers

🎯 Third-Person AI Assistant

  • Portfolio Focus: Responds about Rohit's experience (not "my" experience)
  • Professional Tone: Maintains proper third-person references
  • Context Integration: Combines multiple data points coherently

⚑ Performance Optimizations

  • On-Demand Loading: Datasets load only when RAG is needed
  • Clean Output: No verbose ML logging for general conversations
  • Fast Responses: Sub-second for greetings, ~20s for first RAG query

Available Datasets

developer-portfolio

  • Content: Work experience, skills, projects, achievements
  • Topics: FleetEnable, Coditude, technologies, leadership
  • Size: 19 documents with full metadata

Error Handling

Common Responses

  • Datasets Loading: "RAG Pipeline is running but datasets are still loading..."
  • Dataset Not Found: "Dataset 'xyz' not available. Available datasets: [...]"
  • API Errors: HTTP 500 with error details

Status Codes

  • 200 - Success
  • 400 - Bad Request (invalid JSON, missing fields)
  • 500 - Internal Server Error

Environment Variables

Create .env file:

OPENROUTER_API_KEY=sk-or-v1-your-key-here
PORT=8000
TOKENIZERS_PARALLELISM=false

Development

Running Locally

# Install dependencies
pip install -r requirements.txt

# Start server
python -m uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload

# Or use script
./start.sh

Testing

# Health check
curl http://localhost:8000/health

# Chat test
curl -X POST http://localhost:8000/chat \
  -H "Content-Type: application/json" \
  -d '{"messages":[{"role":"user","content":"hi"}]}'

Deployment

Docker

# Build
docker build -t rag-pipeline .

# Run
docker run -p 8000:8000 rag-pipeline

Hugging Face Spaces

  1. Push code to repository
  2. Connect Space to repository
  3. Set environment variables in Space settings
  4. Automatic deployment from main branch

Architecture

OpenRouter GLM-4.5-air (Parent AI)
β”œβ”€β”€ Tool Calling Logic
β”‚   β”œβ”€β”€ Automatically detects RAG-worthy questions
β”‚   └── Falls back to general knowledge
β”œβ”€β”€ RAG Tool Function
β”‚   β”œβ”€β”€ Dataset selection (developer-portfolio)
β”‚   β”œβ”€β”€ Document retrieval
β”‚   └── Context formatting
└── Response Generation
    β”œβ”€β”€ Tool results integration
    └── Natural language responses

Changelog

v2.0 - Current

  • βœ… OpenRouter GLM integration with tool calling
  • βœ… Intelligent RAG vs. conversation detection
  • βœ… Third-person AI assistant for Rohit's portfolio
  • βœ… On-demand dataset loading
  • βœ… Removed /answer endpoint (use /chat only)
  • βœ… Environment variable configuration
  • βœ… Performance optimizations

v1.0 - Legacy

  • Google Gemini integration
  • Multiple endpoints (/answer, /chat)
  • Background dataset loading
  • First-person responses