Spaces:

stefanjwojcik
/

BioGuideMCP

Running

stefanjwojcik commited on about 1 month ago

Commit

15de73a

1 Parent(s): 9000c90

Add setup script and comprehensive tests for Congressional Bioguide MCP Server

- Created setup.sh for environment setup, including Python version checks and dependency installation.
- Added test_embeddings_data.py to validate embeddings data and FAISS operations.
- Introduced test_faiss_minimal.py for minimal testing of FAISS functionality.
- Implemented test_queries.py to validate database structure and search functionality.
- Added test_sentence_transformers.py to test sentence-transformers integration and performance.

Files changed (15) hide show

.gitattributes +2 -0
README.md +418 -2
build_faiss_index.py +194 -0
faiss_build.log +46 -0
gradio_app.py +574 -0
ingest_data.py +447 -0
mcp_config_example.json +11 -0
requirements-minimal.txt +3 -0
requirements.txt +7 -0
server.py +1219 -0
setup.sh +86 -0
test_embeddings_data.py +143 -0
test_faiss_minimal.py +142 -0
test_queries.py +252 -0
test_sentence_transformers.py +118 -0

.gitattributes CHANGED Viewed

@@ -36,3 +36,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 congress_bio_ids.pkl filter=lfs diff=lfs merge=lfs -text
 congress_faiss.index filter=lfs diff=lfs merge=lfs -text
 congress.db filter=lfs diff=lfs merge=lfs -text

 congress_bio_ids.pkl filter=lfs diff=lfs merge=lfs -text
 congress_faiss.index filter=lfs diff=lfs merge=lfs -text
 congress.db filter=lfs diff=lfs merge=lfs -text
+*.index filter=lfs diff=lfs merge=lfs -text
+*.db filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -5,10 +5,426 @@ colorFrom: purple
 colorTo: yellow
 sdk: gradio
 sdk_version: 5.49.1
-app_file: app.py
 pinned: false
 license: mit
-short_description: 'An mcp allowing users to analyze congressional biographies. '
 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 colorTo: yellow
 sdk: gradio
 sdk_version: 5.49.1
+app_file: gradio_app.py
 pinned: false
 license: mit
+short_description: 'An MCP allowing users to analyze congressional biographies. '
 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
+# Congressional Bioguide MCP Server
+A Model Context Protocol (MCP) server that provides access to Congressional member profiles with both structured SQL queries and semantic search capabilities.
+## Deployment Options
+### 1. Gradio MCP (Hugging Face Spaces)
+Run this MCP as a Gradio app with web interface + MCP server:
+```bash
+python gradio_app.py
+```
+This will launch a web interface at `http://localhost:7860` with 9 tools exposed as both a web UI and MCP tools.
+**Deploy to Hugging Face Spaces:**
+1. Create a new Space on Hugging Face
+2. Set SDK to `gradio` (version 5.49.1+)
+3. Upload all files including `gradio_app.py`, `congress.db`, `congress_faiss.index`, and `congress_bio_ids.pkl`
+4. The app will automatically launch with `mcp_server=True`
+### 2. Traditional MCP Server
+Use the original MCP server for integration with Claude Desktop or other MCP clients:
+```bash
+python server.py
+```
+Test the server backend with `npx @modelcontextprotocol/inspector python server.py` or integrate it into your Claude setup.
+## Features
+### Gradio MCP Tools (9 Tools)
+The Gradio app (`gradio_app.py`) exposes these 9 MCP tools:
+1. **search_by_name** - Search members by name (first/last name)
+2. **search_by_party** - Find by political party affiliation
+3. **search_by_state** - Search by state/region representation
+4. **semantic_search_biography** - AI-powered natural language search of biographies
+5. **get_member_profile** - Get complete profile by Bioguide ID
+6. **count_members_by_party** - Count members grouped by party
+7. **count_members_by_state** - Count members grouped by state
+8. **execute_sql_query** - Execute custom SQL queries (read-only)
+9. **get_database_schema** - View database structure
+### Traditional MCP Server Tools (14 Tools)
+The traditional server (`server.py`) provides all tools:
+**Search Tools** (return concise results by default):
+1. **search_by_name** - Search members by name (returns: name, dates, party, congress)
+2. **search_by_party** - Find by political party affiliation
+3. **search_by_state** - Search by state/region representation
+4. **search_by_congress** - Get all members from specific Congress
+5. **search_by_date_range** - Find members who served during specific dates
+6. **semantic_search_biography** - Natural language AI search of biographies
+7. **search_biography_regex** - Regex pattern search (keywords, phrases)
+8. **search_by_relationship** - Find members with family relationships
+**Aggregation & Analysis Tools** (efficient for large datasets):
+9. **count_members** - Count members by party, state, position, congress, or year
+10. **temporal_analysis** - Analyze trends over time (party shifts, demographics, etc.)
+11. **count_by_biography_content** - Count members mentioning specific keywords (e.g., "Harvard", "lawyer")
+**Profile & Query Tools**:
+12. **get_member_profile** - Get complete profile by Bioguide ID
+13. **execute_sql_query** - Execute custom SQL queries (read-only)
+14. **get_database_schema** - View database structure
+### Database Schema
+- **members** - Core biographical data (13,047+ profiles)
+- **job_positions** - Congressional positions and affiliations
+- **images** - Profile images
+- **relationships** - Family relationships between members
+- **creative_works** - Publications by members
+- **assets** - Additional media assets
+## Requirements
+- **Python 3.10+** including Python 3.14
+- ✅ **Python 3.14 is now supported!** (with single-threaded mode for FAISS)
+## Setup
+### Quick Start
+```bash
+./setup.sh
+```
+This automated script will:
+1. Create a Python virtual environment
+2. Install all dependencies
+3. Ingest all Congressional profiles into SQLite
+4. Build the FAISS semantic search index
+### Manual Setup
+If you prefer manual setup:
+#### 1. Install Dependencies
+```bash
+python3 -m venv venv
+source venv/bin/activate  # On Windows: venv\Scripts\activate
+pip install -r requirements.txt
+```
+#### 2. Ingest Data
+Run the ingestion script to create the SQLite database and FAISS index:
+```bash
+python3 ingest_data.py
+```
+This will:
+- Create `congress.db` SQLite database (13,047+ members)
+- Build `congress_faiss.index` for semantic search
+- Generate `congress_bio_ids.pkl` for ID mapping
+Expected output:
+```
+Starting Congressional Bioguide ingestion...
+============================================================
+✓ Database schema created
+Ingesting 13047 profiles...
+  Processed 1000/13047 profiles...
+  ...
+✓ Ingested 13047 profiles into database
+Building FAISS index for semantic search...
+  Encoding 13047 biographies...
+    Encoded 3200/13047 biographies...
+    ...
+✓ FAISS index created with 13047 vectors
+  Index dimension: 384
+============================================================
+✓ Ingestion complete!
+```
+**Note**: Ingestion takes approximately 5-10 minutes depending on your system.
+#### 3. Test the System (Optional)
+```bash
+python3 test_queries.py
+```
+#### 4. Run the Server
+```bash
+python3 server.py
+```
+## Usage Examples
+### Name Search
+```json
+{
+  "name": "search_by_name",
+  "arguments": {
+    "family_name": "Lincoln"
+  }
+}
+```
+### Party Search
+```json
+{
+  "name": "search_by_party",
+  "arguments": {
+    "party": "Republican",
+    "congress_number": 117
+  }
+}
+```
+### State Search
+```json
+{
+  "name": "search_by_state",
+  "arguments": {
+    "state_code": "CA",
+    "congress_number": 117
+  }
+}
+```
+### Semantic Search
+```json
+{
+  "name": "semantic_search_biography",
+  "arguments": {
+    "query": "Civil War veterans who became lawyers",
+    "top_k": 5
+  }
+}
+```
+### Regex Search - Find Keywords
+```json
+{
+  "name": "search_biography_regex",
+  "arguments": {
+    "pattern": "Harvard",
+    "limit": 5
+  }
+}
+```
+### Regex Search - Filter by Party
+```json
+{
+  "name": "search_biography_regex",
+  "arguments": {
+    "pattern": "lawyer",
+    "filter_party": "Republican",
+    "limit": 10
+  }
+}
+```
+### Regex Search - Filter by State and Congress
+```json
+{
+  "name": "search_biography_regex",
+  "arguments": {
+    "pattern": "served.*Confederate Army",
+    "filter_state": "VA",
+    "limit": 5
+  }
+}
+```
+**Note**: Regex search returns concise results (name, dates, party, state) by default. Set `return_full_profile: true` to get biography text.
+### Count Members by Party
+```json
+{
+  "name": "count_members",
+  "arguments": {
+    "group_by": "party"
+  }
+}
+```
+### Count Republicans by State in 117th Congress
+```json
+{
+  "name": "count_members",
+  "arguments": {
+    "group_by": "state",
+    "filter_party": "Republican",
+    "filter_congress": 117
+  }
+}
+```
+### Temporal Analysis - Party Changes Over Time
+```json
+{
+  "name": "temporal_analysis",
+  "arguments": {
+    "analysis_type": "party_over_time",
+    "time_unit": "congress",
+    "start_date": "1900-01-01",
+    "end_date": "2000-12-31"
+  }
+}
+```
+### Demographics Analysis - Average Age by Congress
+```json
+{
+  "name": "temporal_analysis",
+  "arguments": {
+    "analysis_type": "demographics",
+    "time_unit": "congress"
+  }
+}
+```
+### Count Members Who Attended Harvard
+```json
+{
+  "name": "count_by_biography_content",
+  "arguments": {
+    "keywords": ["Harvard"]
+  }
+}
+```
+### Count Lawyers by Party
+```json
+{
+  "name": "count_by_biography_content",
+  "arguments": {
+    "keywords": ["lawyer", "attorney"],
+    "breakdown_by": "party"
+  }
+}
+```
+### Count Members Who Were Both Lawyers AND Veterans
+```json
+{
+  "name": "count_by_biography_content",
+  "arguments": {
+    "keywords": ["lawyer", "military", "army"],
+    "match_all": false,
+    "breakdown_by": "state"
+  }
+}
+```
+### SQL Query - Find Longest Serving Members
+```json
+{
+  "name": "execute_sql_query",
+  "arguments": {
+    "query": "SELECT family_name, given_name, COUNT(DISTINCT congress_number) as congresses FROM members m JOIN job_positions j ON m.bio_id = j.bio_id GROUP BY m.bio_id HAVING congresses > 5 ORDER BY congresses DESC LIMIT 10"
+  }
+}
+```
+### Get Full Member Profile
+```json
+{
+  "name": "get_member_profile",
+  "arguments": {
+    "bio_id": "L000313"
+  }
+}
+```
+### Search by Congress Number
+```json
+{
+  "name": "search_by_congress",
+  "arguments": {
+    "congress_number": 117,
+    "chamber": "Senator"
+  }
+}
+```
+### Search by Date Range
+```json
+{
+  "name": "search_by_date_range",
+  "arguments": {
+    "start_date": "1861-03-04",
+    "end_date": "1865-03-04"
+  }
+}
+```
+### Find Family Relationships
+```json
+{
+  "name": "search_by_relationship",
+  "arguments": {
+    "relationship_type": "father"
+  }
+}
+```
+### Complex SQL - Party Transitions
+```json
+{
+  "name": "execute_sql_query",
+  "arguments": {
+    "query": "SELECT m.bio_id, m.family_name, m.given_name, GROUP_CONCAT(DISTINCT j.party) as parties FROM members m JOIN job_positions j ON m.bio_id = j.bio_id WHERE j.party IS NOT NULL GROUP BY m.bio_id HAVING COUNT(DISTINCT j.party) > 1 LIMIT 20"
+  }
+}
+```
+## Data Source
+Data comes from the US Congressional Bioguide, containing biographical information for all members of Congress throughout history.
+## Technical Details
+- **Database**: SQLite for structured queries
+- **Semantic Search**: FAISS with sentence-transformers (all-MiniLM-L6-v2)
+- **Embedding Dimension**: 384
+- **Index Type**: Flat IP (Inner Product) with L2 normalization for cosine similarity
+## MCP Configuration
+Add to your MCP settings file (usually `~/.config/claude/claude_desktop_config.json` on macOS/Linux or `%APPDATA%\Claude\claude_desktop_config.json` on Windows):
+```json
+{
+  "mcpServers": {
+    "congressional-bioguide": {
+      "command": "/Users/electron/workspace/Nanocentury AI/NIO/BioGuideMCP/venv/bin/python",
+      "args": [
+        "/Users/electron/workspace/Nanocentury AI/NIO/BioGuideMCP/server.py"
+      ],
+      "cwd": "/Users/electron/workspace/Nanocentury AI/NIO/BioGuideMCP"
+    }
+  }
+}
+```
+**Note**: This uses the virtual environment's Python which has all the required dependencies installed.
+## License
+Data is public domain from the US Congressional Bioguide.

build_faiss_index.py ADDED Viewed

	@@ -0,0 +1,194 @@

+#!/usr/bin/env python3
+"""
+Build FAISS index from Congressional biography database.
+This script:
+1. Loads all biographies from the SQLite database
+2. Generates embeddings using sentence transformers
+3. Builds a FAISS index for fast similarity search
+4. Saves the index and bio ID mapping to disk
+Run this script whenever:
+- The database is first created
+- You want to rebuild the semantic search index
+- After updating to a compatible Python version
+Requires Python 3.9-3.12 (Python 3.14+ may have compatibility issues)
+"""
+import sqlite3
+import faiss
+import numpy as np
+import pickle
+import time
+import os
+from pathlib import Path
+from sentence_transformers import SentenceTransformer
+# Paths
+SCRIPT_DIR = Path(__file__).parent.absolute()
+DB_PATH = str(SCRIPT_DIR / "congress.db")
+INDEX_PATH = str(SCRIPT_DIR / "congress_faiss.index")
+MAPPING_PATH = str(SCRIPT_DIR / "congress_bio_ids.pkl")
+def build_faiss_index():
+    """Build FAISS index from database biographies."""
+    print("=" * 60)
+    print("BUILDING FAISS INDEX FOR CONGRESSIONAL BIOGUIDE")
+    print("=" * 60)
+    # Check database exists
+    if not Path(DB_PATH).exists():
+        print(f"\n❌ ERROR: Database not found at {DB_PATH}")
+        print("   Run ingest_data.py first to create the database.")
+        return False
+    # Load sentence transformer model
+    print("\n1. Loading sentence transformer model...")
+    start = time.time()
+    # Disable all parallelism to avoid Python 3.14 issues
+    os.environ['TOKENIZERS_PARALLELISM'] = 'false'
+    os.environ['OMP_NUM_THREADS'] = '1'
+    os.environ['MKL_NUM_THREADS'] = '1'
+    os.environ['OPENBLAS_NUM_THREADS'] = '1'
+    import torch
+    torch.set_num_threads(1)
+    model = SentenceTransformer('all-MiniLM-L6-v2')
+    print(f"   ✓ Model loaded in {time.time() - start:.3f}s")
+    # Load biographies from database
+    print("\n2. Loading biographies from database...")
+    start = time.time()
+    conn = sqlite3.connect(DB_PATH)
+    cursor = conn.cursor()
+    cursor.execute("""
+        SELECT bio_id, profile_text
+        FROM members
+        WHERE profile_text IS NOT NULL AND profile_text != ''
+    """)
+    rows = cursor.fetchall()
+    conn.close()
+    elapsed = time.time() - start
+    print(f"   ✓ Loaded {len(rows):,} biographies in {elapsed:.3f}s")
+    if len(rows) == 0:
+        print("\n❌ ERROR: No biographies found in database!")
+        return False
+    # Prepare data
+    print("\n3. Preparing data for encoding...")
+    start = time.time()
+    bio_ids = [row[0] for row in rows]
+    texts = [row[1] for row in rows]
+    print(f"   ✓ Prepared {len(bio_ids):,} texts")
+    print(f"   ✓ Time: {time.time() - start:.3f}s")
+    # Generate embeddings in batches
+    print("\n4. Generating embeddings...")
+    print("   (This may take several minutes...)")
+    start = time.time()
+    batch_size = 32
+    embeddings = []
+    for i in range(0, len(texts), batch_size):
+        batch = texts[i:i + batch_size]
+        batch_embeddings = model.encode(
+            batch,
+            show_progress_bar=False,
+            convert_to_numpy=True,
+            normalize_embeddings=False,
+            device='cpu'  # Explicit CPU to avoid issues
+        )
+        embeddings.extend(batch_embeddings)
+        # Progress update every 100 batches (~3200 texts)
+        if (i // batch_size + 1) % 100 == 0:
+            elapsed = time.time() - start
+            rate = (i + len(batch)) / elapsed
+            remaining = (len(texts) - i - len(batch)) / rate if rate > 0 else 0
+            print(f"   Encoded {i + len(batch):,}/{len(texts):,} " +
+                  f"({rate:.0f} texts/sec, ~{remaining:.0f}s remaining)")
+    embeddings = np.array(embeddings, dtype=np.float32)
+    elapsed = time.time() - start
+    print(f"   ✓ Generated {len(embeddings):,} embeddings in {elapsed:.1f}s")
+    print(f"   ✓ Shape: {embeddings.shape}")
+    # Build FAISS index
+    print("\n5. Building FAISS index...")
+    start = time.time()
+    dimension = embeddings.shape[1]
+    print(f"   Dimension: {dimension}")
+    # Use IndexFlatIP for exact cosine similarity search
+    # (Inner Product is equivalent to cosine similarity for normalized vectors)
+    index = faiss.IndexFlatIP(dimension)
+    # Normalize embeddings for cosine similarity
+    faiss.normalize_L2(embeddings)
+    # Add embeddings to index
+    index.add(embeddings)
+    elapsed = time.time() - start
+    print(f"   ✓ Index built in {elapsed:.3f}s")
+    print(f"   ✓ Total vectors in index: {index.ntotal:,}")
+    # Save FAISS index
+    print("\n6. Saving FAISS index to disk...")
+    start = time.time()
+    faiss.write_index(index, INDEX_PATH)
+    elapsed = time.time() - start
+    print(f"   ✓ Index saved to: {INDEX_PATH}")
+    print(f"   ✓ Time: {elapsed:.3f}s")
+    # Save bio ID mapping
+    print("\n7. Saving bio ID mapping...")
+    start = time.time()
+    with open(MAPPING_PATH, "wb") as f:
+        pickle.dump(bio_ids, f)
+    elapsed = time.time() - start
+    print(f"   ✓ Mapping saved to: {MAPPING_PATH}")
+    print(f"   ✓ Time: {elapsed:.3f}s")
+    # Get file sizes
+    index_size_mb = Path(INDEX_PATH).stat().st_size / (1024**2)
+    mapping_size_mb = Path(MAPPING_PATH).stat().st_size / (1024**2)
+    print("\n" + "=" * 60)
+    print("FAISS INDEX BUILD COMPLETE")
+    print("=" * 60)
+    print(f"Total biographies indexed: {len(bio_ids):,}")
+    print(f"Index file size: {index_size_mb:.2f} MB")
+    print(f"Mapping file size: {mapping_size_mb:.2f} MB")
+    print(f"Total size: {index_size_mb + mapping_size_mb:.2f} MB")
+    print("\nThe MCP server will now load this index on startup for semantic search.")
+    print("You can now use the 'semantic_search_biography' tool!")
+    return True
+def main():
+    """Main entry point."""
+    try:
+        success = build_faiss_index()
+        if not success:
+            exit(1)
+    except Exception as e:
+        print(f"\n❌ ERROR: {e}")
+        print("\nThis may be due to Python version incompatibility.")
+        print("FAISS and sentence-transformers work best with Python 3.9-3.12")
+        print(f"Current Python version: {os.sys.version}")
+        print("\nThe database is still usable without semantic search.")
+        import traceback
+        traceback.print_exc()
+        exit(1)
+if __name__ == "__main__":
+    main()

faiss_build.log ADDED Viewed

	@@ -0,0 +1,46 @@

+============================================================
+BUILDING FAISS INDEX FOR CONGRESSIONAL BIOGUIDE
+============================================================
+1. Loading sentence transformer model...
+   ✓ Model loaded in 2.021s
+2. Loading biographies from database...
+   ✓ Loaded 13,047 biographies in 0.211s
+3. Preparing data for encoding...
+   ✓ Prepared 13,047 texts
+   ✓ Time: 0.000s
+4. Generating embeddings...
+   (This may take several minutes...)
+   Encoded 3,200/13,047 (48 texts/sec, ~207s remaining)
+   Encoded 6,400/13,047 (47 texts/sec, ~141s remaining)
+   Encoded 9,600/13,047 (47 texts/sec, ~74s remaining)
+   Encoded 12,800/13,047 (46 texts/sec, ~5s remaining)
+   ✓ Generated 13,047 embeddings in 280.6s
+   ✓ Shape: (13047, 384)
+5. Building FAISS index...
+   Dimension: 384
+   ✓ Index built in 0.009s
+   ✓ Total vectors in index: 13,047
+6. Saving FAISS index to disk...
+   ✓ Index saved to: /Users/electron/workspace/Nanocentury AI/NIO/BioGuideMCP/congress_faiss.index
+   ✓ Time: 0.004s
+7. Saving bio ID mapping...
+   ✓ Mapping saved to: /Users/electron/workspace/Nanocentury AI/NIO/BioGuideMCP/congress_bio_ids.pkl
+   ✓ Time: 0.001s
+============================================================
+FAISS INDEX BUILD COMPLETE
+============================================================
+Total biographies indexed: 13,047
+Index file size: 19.11 MB
+Mapping file size: 0.12 MB
+Total size: 19.24 MB
+The MCP server will now load this index on startup for semantic search.
+You can now use the 'semantic_search_biography' tool!

gradio_app.py ADDED Viewed

	@@ -0,0 +1,574 @@

+#!/usr/bin/env python3
+"""
+Gradio MCP Server for Congressional Bioguide profiles.
+Provides search and analysis capabilities via Gradio interface.
+"""
+import gradio as gr
+import sqlite3
+import json
+import os
+import warnings
+from typing import List, Dict, Any
+import numpy as np
+from sentence_transformers import SentenceTransformer
+import faiss
+import pickle
+from pathlib import Path
+# Suppress warnings
+warnings.filterwarnings('ignore')
+os.environ['TOKENIZERS_PARALLELISM'] = 'false'
+# Initialize global resources
+SCRIPT_DIR = Path(__file__).parent.absolute()
+DB_PATH = str(SCRIPT_DIR / "congress.db")
+FAISS_INDEX_PATH = str(SCRIPT_DIR / "congress_faiss.index")
+BIO_IDS_PATH = str(SCRIPT_DIR / "congress_bio_ids.pkl")
+# Global state
+model = None
+faiss_index = None
+bio_id_mapping = None
+def initialize_search_index():
+    """Initialize the semantic search components."""
+    global model, faiss_index, bio_id_mapping
+    try:
+        if Path(FAISS_INDEX_PATH).exists() and Path(BIO_IDS_PATH).exists():
+            print(f"Loading FAISS index from: {FAISS_INDEX_PATH}")
+            model = SentenceTransformer('all-MiniLM-L6-v2')
+            faiss_index = faiss.read_index(FAISS_INDEX_PATH)
+            with open(BIO_IDS_PATH, "rb") as f:
+                bio_id_mapping = pickle.load(f)
+            print(f"✓ Loaded {faiss_index.ntotal} embeddings")
+            return True
+        else:
+            print(f"FAISS index not found. Semantic search will be unavailable.")
+            return False
+    except Exception as e:
+        print(f"Error loading search index: {e}")
+        return False
+def get_db_connection():
+    """Get a database connection."""
+    return sqlite3.connect(DB_PATH)
+def execute_query(query: str, params: tuple = ()) -> List[Dict[str, Any]]:
+    """Execute a SQL query and return results as list of dicts."""
+    conn = get_db_connection()
+    conn.row_factory = sqlite3.Row
+    cursor = conn.cursor()
+    cursor.execute(query, params)
+    results = [dict(row) for row in cursor.fetchall()]
+    conn.close()
+    return results
+# Initialize search index on startup
+print("Initializing Congressional Bioguide MCP Server...")
+initialize_search_index()
+# MCP Tool Functions with decorators
+@gr.mcp.tool()
+def search_by_name(family_name: str = "", given_name: str = "", limit: int = 10) -> str:
+    """
+    Search for Congressional members by name.
+    Args:
+        family_name: Last name to search for (partial match)
+        given_name: First name to search for (partial match)
+        limit: Maximum number of results to return (default: 10)
+    Returns:
+        JSON string with search results including bio_id, name, birth/death dates, party, state
+    """
+    try:
+        conditions = []
+        params = []
+        if family_name:
+            conditions.append("LOWER(m.unaccented_family_name) LIKE LOWER(?)")
+            params.append(f"%{family_name}%")
+        if given_name:
+            conditions.append("LOWER(m.unaccented_given_name) LIKE LOWER(?)")
+            params.append(f"%{given_name}%")
+        if not conditions:
+            return json.dumps({"error": "Please provide at least family_name or given_name"})
+        query = f"""
+        SELECT DISTINCT m.bio_id, m.given_name, m.middle_name, m.family_name,
+               m.birth_date, m.death_date,
+               j.party, j.region_code, j.job_name, j.congress_number
+        FROM members m
+        LEFT JOIN job_positions j ON m.bio_id = j.bio_id
+        WHERE {' AND '.join(conditions)}
+        ORDER BY m.family_name, m.given_name
+        LIMIT ?
+        """
+        params.append(limit)
+        results = execute_query(query, tuple(params))
+        return json.dumps({"count": len(results), "results": results}, indent=2)
+    except Exception as e:
+        return json.dumps({"error": str(e)})
+@gr.mcp.tool()
+def search_by_party(party: str, congress_number: int = None) -> str:
+    """
+    Search for Congressional members by political party.
+    Args:
+        party: Party name (e.g., 'Republican', 'Democrat', 'Whig')
+        congress_number: Optional Congress number to filter by (e.g., 117)
+    Returns:
+        JSON string with members from the specified party
+    """
+    try:
+        if congress_number:
+            query = """
+            SELECT DISTINCT m.bio_id, m.given_name, m.family_name, m.birth_date, m.death_date,
+                   j.party, j.region_code, j.job_name, j.congress_number
+            FROM members m
+            JOIN job_positions j ON m.bio_id = j.bio_id
+            WHERE j.party = ? AND j.congress_number = ?
+            ORDER BY m.family_name, m.given_name
+            LIMIT 100
+            """
+            results = execute_query(query, (party, congress_number))
+        else:
+            query = """
+            SELECT DISTINCT m.bio_id, m.given_name, m.family_name, m.birth_date, m.death_date,
+                   j.party, j.region_code, j.job_name, j.congress_number
+            FROM members m
+            JOIN job_positions j ON m.bio_id = j.bio_id
+            WHERE j.party = ?
+            ORDER BY m.family_name, m.given_name
+            LIMIT 100
+            """
+            results = execute_query(query, (party,))
+        return json.dumps({"count": len(results), "party": party, "results": results}, indent=2)
+    except Exception as e:
+        return json.dumps({"error": str(e)})
+@gr.mcp.tool()
+def search_by_state(state_code: str, congress_number: int = None) -> str:
+    """
+    Search for Congressional members by state.
+    Args:
+        state_code: Two-letter state code (e.g., 'CA', 'NY', 'TX')
+        congress_number: Optional Congress number to filter by
+    Returns:
+        JSON string with members from the specified state
+    """
+    try:
+        state_code = state_code.upper()
+        if congress_number:
+            query = """
+            SELECT DISTINCT m.bio_id, m.given_name, m.family_name, m.birth_date, m.death_date,
+                   j.party, j.region_code, j.job_name, j.congress_number
+            FROM members m
+            JOIN job_positions j ON m.bio_id = j.bio_id
+            WHERE j.region_code = ? AND j.congress_number = ?
+            ORDER BY m.family_name, m.given_name
+            LIMIT 100
+            """
+            results = execute_query(query, (state_code, congress_number))
+        else:
+            query = """
+            SELECT DISTINCT m.bio_id, m.given_name, m.family_name, m.birth_date, m.death_date,
+                   j.party, j.region_code, j.job_name, j.congress_number
+            FROM members m
+            JOIN job_positions j ON m.bio_id = j.bio_id
+            WHERE j.region_code = ?
+            ORDER BY m.family_name, m.given_name
+            LIMIT 100
+            """
+            results = execute_query(query, (state_code,))
+        return json.dumps({"count": len(results), "state": state_code, "results": results}, indent=2)
+    except Exception as e:
+        return json.dumps({"error": str(e)})
+@gr.mcp.tool()
+def semantic_search_biography(query: str, top_k: int = 5) -> str:
+    """
+    Perform AI-powered semantic search on member biographies using natural language.
+    Args:
+        query: Natural language query (e.g., 'lawyers who became judges', 'Civil War veterans')
+        top_k: Number of results to return (default: 5, max: 20)
+    Returns:
+        JSON string with matching members and their similarity scores
+    """
+    try:
+        if not all([model, faiss_index, bio_id_mapping]):
+            return json.dumps({"error": "Semantic search is not available. FAISS index not loaded."})
+        # Limit top_k
+        top_k = min(max(1, top_k), 20)
+        # Encode query
+        query_embedding = model.encode([query])[0].astype('float32')
+        query_embedding = query_embedding.reshape(1, -1)
+        faiss.normalize_L2(query_embedding)
+        # Search
+        scores, indices = faiss_index.search(query_embedding, top_k)
+        # Get profiles
+        results = []
+        for idx, score in zip(indices[0], scores[0]):
+            if idx < len(bio_id_mapping):
+                bio_id = bio_id_mapping[idx]
+                member_query = """
+                SELECT m.bio_id, m.given_name, m.middle_name, m.family_name,
+                       m.birth_date, m.death_date, m.profile_text,
+                       j.party, j.region_code, j.job_name, j.congress_number
+                FROM members m
+                LEFT JOIN job_positions j ON m.bio_id = j.bio_id
+                WHERE m.bio_id = ?
+                LIMIT 1
+                """
+                member_data = execute_query(member_query, (bio_id,))
+                if member_data:
+                    member = member_data[0]
+                    # Truncate profile_text for response
+                    if member.get('profile_text'):
+                        member['profile_text'] = member['profile_text'][:500] + "..."
+                    member['similarity_score'] = float(score)
+                    results.append(member)
+        return json.dumps({"query": query, "count": len(results), "results": results}, indent=2)
+    except Exception as e:
+        return json.dumps({"error": str(e)})
+@gr.mcp.tool()
+def get_member_profile(bio_id: str) -> str:
+    """
+    Get complete profile for a specific member by their Bioguide ID.
+    Args:
+        bio_id: Bioguide ID (e.g., 'L000313' for John Lewis, 'W000374')
+    Returns:
+        JSON string with complete member profile including positions and relationships
+    """
+    try:
+        bio_id = bio_id.upper()
+        conn = get_db_connection()
+        conn.row_factory = sqlite3.Row
+        cursor = conn.cursor()
+        cursor.execute("SELECT * FROM members WHERE bio_id = ?", (bio_id,))
+        member = cursor.fetchone()
+        if not member:
+            conn.close()
+            return json.dumps({"error": f"No member found with bio_id: {bio_id}"})
+        profile = dict(member)
+        # Get job positions
+        cursor.execute("SELECT * FROM job_positions WHERE bio_id = ? ORDER BY start_date", (bio_id,))
+        profile['job_positions'] = [dict(row) for row in cursor.fetchall()]
+        # Get relationships
+        cursor.execute("SELECT * FROM relationships WHERE bio_id = ?", (bio_id,))
+        profile['relationships'] = [dict(row) for row in cursor.fetchall()]
+        # Get creative works
+        cursor.execute("SELECT * FROM creative_works WHERE bio_id = ?", (bio_id,))
+        profile['creative_works'] = [dict(row) for row in cursor.fetchall()]
+        conn.close()
+        return json.dumps(profile, indent=2)
+    except Exception as e:
+        return json.dumps({"error": str(e)})
+@gr.mcp.tool()
+def count_members_by_party(filter_congress: int = None) -> str:
+    """
+    Count members by political party.
+    Args:
+        filter_congress: Optional Congress number to filter by (e.g., 117)
+    Returns:
+        JSON string with member counts grouped by party
+    """
+    try:
+        if filter_congress:
+            query = """
+            SELECT j.party as party, COUNT(DISTINCT m.bio_id) as count
+            FROM members m
+            JOIN job_positions j ON m.bio_id = j.bio_id
+            WHERE j.congress_number = ?
+            GROUP BY j.party
+            ORDER BY count DESC
+            """
+            results = execute_query(query, (filter_congress,))
+        else:
+            query = """
+            SELECT j.party as party, COUNT(DISTINCT m.bio_id) as count
+            FROM members m
+            JOIN job_positions j ON m.bio_id = j.bio_id
+            GROUP BY j.party
+            ORDER BY count DESC
+            """
+            results = execute_query(query)
+        total = sum(r['count'] for r in results)
+        return json.dumps({"total_members": total, "by_party": results}, indent=2)
+    except Exception as e:
+        return json.dumps({"error": str(e)})
+@gr.mcp.tool()
+def count_members_by_state(filter_congress: int = None) -> str:
+    """
+    Count members by state.
+    Args:
+        filter_congress: Optional Congress number to filter by
+    Returns:
+        JSON string with member counts grouped by state
+    """
+    try:
+        if filter_congress:
+            query = """
+            SELECT j.region_code as state, COUNT(DISTINCT m.bio_id) as count
+            FROM members m
+            JOIN job_positions j ON m.bio_id = j.bio_id
+            WHERE j.congress_number = ?
+            GROUP BY j.region_code
+            ORDER BY count DESC
+            """
+            results = execute_query(query, (filter_congress,))
+        else:
+            query = """
+            SELECT j.region_code as state, COUNT(DISTINCT m.bio_id) as count
+            FROM members m
+            JOIN job_positions j ON m.bio_id = j.bio_id
+            GROUP BY j.region_code
+            ORDER BY count DESC
+            """
+            results = execute_query(query)
+        total = sum(r['count'] for r in results)
+        return json.dumps({"total_members": total, "by_state": results}, indent=2)
+    except Exception as e:
+        return json.dumps({"error": str(e)})
+@gr.mcp.tool()
+def execute_sql_query(query: str) -> str:
+    """
+    Execute a custom SQL SELECT query against the Congressional database (READ-ONLY).
+    Args:
+        query: SQL SELECT query to execute
+    Returns:
+        JSON string with query results
+    """
+    try:
+        # Security: only allow SELECT queries
+        if not query.strip().upper().startswith("SELECT"):
+            return json.dumps({"error": "Only SELECT queries are allowed"})
+        results = execute_query(query)
+        return json.dumps({"count": len(results), "results": results}, indent=2)
+    except Exception as e:
+        return json.dumps({"error": str(e)})
+@gr.mcp.tool()
+def get_database_schema() -> str:
+    """
+    Get the database schema showing all tables and columns available for querying.
+    Returns:
+        JSON string with database schema information
+    """
+    schema_info = {
+        "tables": {
+            "members": {
+                "description": "Main table with member biographical information",
+                "columns": [
+                    "bio_id (PRIMARY KEY) - Bioguide ID",
+                    "family_name - Last name",
+                    "given_name - First name",
+                    "middle_name - Middle name",
+                    "birth_date - Birth date (YYYY-MM-DD)",
+                    "death_date - Death date (YYYY-MM-DD)",
+                    "profile_text - Full biography text"
+                ]
+            },
+            "job_positions": {
+                "description": "Congressional positions held by members",
+                "columns": [
+                    "bio_id (FOREIGN KEY) - References members",
+                    "job_name - Position title (Representative, Senator)",
+                    "start_date - Start date of position",
+                    "end_date - End date of position",
+                    "congress_number - Congress number (e.g., 117)",
+                    "party - Party affiliation",
+                    "region_code - State/region code (e.g., 'CA', 'NY')"
+                ]
+            },
+            "relationships": {
+                "description": "Family relationships between members",
+                "columns": ["bio_id", "related_bio_id", "relationship_type"]
+            },
+            "creative_works": {
+                "description": "Publications and creative works by members",
+                "columns": ["bio_id", "citation_text"]
+            }
+        }
+    }
+    return json.dumps(schema_info, indent=2)
+# Create Gradio Interfaces for each tool
+demo = gr.TabbedInterface(
+    [
+        # Search by Name
+        gr.Interface(
+            fn=search_by_name,
+            inputs=[
+                gr.Textbox(label="Family Name (Last Name)", placeholder="e.g., Lincoln"),
+                gr.Textbox(label="Given Name (First Name)", placeholder="e.g., Abraham"),
+                gr.Slider(minimum=1, maximum=50, value=10, step=1, label="Max Results")
+            ],
+            outputs=gr.JSON(label="Search Results"),
+            title="Search by Name",
+            description="Search for Congressional members by their first or last name."
+        ),
+        # Search by Party
+        gr.Interface(
+            fn=search_by_party,
+            inputs=[
+                gr.Textbox(label="Party Name", placeholder="e.g., Republican, Democrat, Whig"),
+                gr.Number(label="Congress Number (optional)", value=None, precision=0)
+            ],
+            outputs=gr.JSON(label="Search Results"),
+            title="Search by Party",
+            description="Find members by political party affiliation."
+        ),
+        # Search by State
+        gr.Interface(
+            fn=search_by_state,
+            inputs=[
+                gr.Textbox(label="State Code", placeholder="e.g., CA, NY, TX"),
+                gr.Number(label="Congress Number (optional)", value=None, precision=0)
+            ],
+            outputs=gr.JSON(label="Search Results"),
+            title="Search by State",
+            description="Find members by the state they represented."
+        ),
+        # Semantic Search
+        gr.Interface(
+            fn=semantic_search_biography,
+            inputs=[
+                gr.Textbox(label="Search Query", placeholder="e.g., 'lawyers who became judges' or 'Civil War veterans'", lines=3),
+                gr.Slider(minimum=1, maximum=20, value=5, step=1, label="Number of Results")
+            ],
+            outputs=gr.JSON(label="Search Results"),
+            title="AI Semantic Search",
+            description="Use natural language to search biographies. Find members by career, background, or accomplishments."
+        ),
+        # Get Member Profile
+        gr.Interface(
+            fn=get_member_profile,
+            inputs=gr.Textbox(label="Bioguide ID", placeholder="e.g., L000313 (John Lewis)"),
+            outputs=gr.JSON(label="Member Profile"),
+            title="Get Member Profile",
+            description="Get complete profile for a specific member using their Bioguide ID."
+        ),
+        # Count by Party
+        gr.Interface(
+            fn=count_members_by_party,
+            inputs=gr.Number(label="Filter by Congress Number (optional)", value=None, precision=0),
+            outputs=gr.JSON(label="Party Counts"),
+            title="Count by Party",
+            description="Get member counts grouped by political party."
+        ),
+        # Count by State
+        gr.Interface(
+            fn=count_members_by_state,
+            inputs=gr.Number(label="Filter by Congress Number (optional)", value=None, precision=0),
+            outputs=gr.JSON(label="State Counts"),
+            title="Count by State",
+            description="Get member counts grouped by state."
+        ),
+        # SQL Query
+        gr.Interface(
+            fn=execute_sql_query,
+            inputs=gr.Textbox(label="SQL Query", placeholder="SELECT * FROM members LIMIT 10", lines=3),
+            outputs=gr.JSON(label="Query Results"),
+            title="Execute SQL",
+            description="Execute custom SQL SELECT queries (read-only)."
+        ),
+        # Database Schema
+        gr.Interface(
+            fn=get_database_schema,
+            inputs=None,
+            outputs=gr.JSON(label="Database Schema"),
+            title="Database Schema",
+            description="View the database structure and available tables/columns."
+        ),
+    ],
+    tab_names=[
+        "Search by Name",
+        "Search by Party",
+        "Search by State",
+        "AI Semantic Search",
+        "Member Profile",
+        "Count by Party",
+        "Count by State",
+        "Execute SQL",
+        "Database Schema"
+    ],
+    title="🏛️ Congressional Bioguide MCP Server",
+    theme=gr.themes.Soft()
+)
+if __name__ == "__main__":
+    demo.launch(mcp_server=True)

ingest_data.py ADDED Viewed

	@@ -0,0 +1,447 @@

+#!/usr/bin/env python3
+"""
+Ingestion script for Congressional Bioguide profiles.
+Creates SQLite database and FAISS semantic search index.
+"""
+import json
+import sqlite3
+import os
+import time
+from pathlib import Path
+from typing import Dict, List, Any
+import faiss
+import numpy as np
+import pickle
+from sentence_transformers import SentenceTransformer
+class BioguideIngester:
+    def __init__(self, data_dir: str = "BioguideProfiles", db_path: str = "congress.db"):
+        self.data_dir = Path(data_dir)
+        self.db_path = db_path
+        self.model = None  # Load model only when needed for FAISS indexing
+    def create_database_schema(self):
+        """Create SQLite database schema for Congressional profiles."""
+        conn = sqlite3.connect(self.db_path)
+        cursor = conn.cursor()
+        # Main members table
+        cursor.execute("""
+        CREATE TABLE IF NOT EXISTS members (
+            bio_id TEXT PRIMARY KEY,
+            family_name TEXT,
+            given_name TEXT,
+            middle_name TEXT,
+            honorific_prefix TEXT,
+            unaccented_family_name TEXT,
+            unaccented_given_name TEXT,
+            unaccented_middle_name TEXT,
+            birth_date TEXT,
+            birth_circa INTEGER,
+            death_date TEXT,
+            death_circa INTEGER,
+            profile_text TEXT,
+            full_name TEXT GENERATED ALWAYS AS (
+                COALESCE(honorific_prefix || ' ', '') ||
+                COALESCE(given_name, '') || ' ' ||
+                COALESCE(middle_name || ' ', '') ||
+                COALESCE(family_name, '')
+            ) STORED
+        )
+        """)
+        # Images table
+        cursor.execute("""
+        CREATE TABLE IF NOT EXISTS images (
+            id INTEGER PRIMARY KEY AUTOINCREMENT,
+            bio_id TEXT,
+            content_url TEXT,
+            caption TEXT,
+            FOREIGN KEY (bio_id) REFERENCES members(bio_id)
+        )
+        """)
+        # Job positions table
+        cursor.execute("""
+        CREATE TABLE IF NOT EXISTS job_positions (
+            id INTEGER PRIMARY KEY AUTOINCREMENT,
+            bio_id TEXT,
+            job_name TEXT,
+            job_type TEXT,
+            start_date TEXT,
+            start_circa INTEGER,
+            end_date TEXT,
+            end_circa INTEGER,
+            congress_number INTEGER,
+            congress_name TEXT,
+            party TEXT,
+            caucus TEXT,
+            region_type TEXT,
+            region_code TEXT,
+            note TEXT,
+            FOREIGN KEY (bio_id) REFERENCES members(bio_id)
+        )
+        """)
+        # Relationships table
+        cursor.execute("""
+        CREATE TABLE IF NOT EXISTS relationships (
+            id INTEGER PRIMARY KEY AUTOINCREMENT,
+            bio_id TEXT,
+            related_bio_id TEXT,
+            relationship_type TEXT,
+            FOREIGN KEY (bio_id) REFERENCES members(bio_id),
+            FOREIGN KEY (related_bio_id) REFERENCES members(bio_id)
+        )
+        """)
+        # Creative works table
+        cursor.execute("""
+        CREATE TABLE IF NOT EXISTS creative_works (
+            id INTEGER PRIMARY KEY AUTOINCREMENT,
+            bio_id TEXT,
+            citation_text TEXT,
+            FOREIGN KEY (bio_id) REFERENCES members(bio_id)
+        )
+        """)
+        # Assets table
+        cursor.execute("""
+        CREATE TABLE IF NOT EXISTS assets (
+            id INTEGER PRIMARY KEY AUTOINCREMENT,
+            bio_id TEXT,
+            name TEXT,
+            asset_type TEXT,
+            content_url TEXT,
+            credit_line TEXT,
+            accession_number TEXT,
+            upload_date TEXT,
+            FOREIGN KEY (bio_id) REFERENCES members(bio_id)
+        )
+        """)
+        # Create indexes for common queries
+        cursor.execute("CREATE INDEX IF NOT EXISTS idx_family_name ON members(unaccented_family_name)")
+        cursor.execute("CREATE INDEX IF NOT EXISTS idx_given_name ON members(unaccented_given_name)")
+        cursor.execute("CREATE INDEX IF NOT EXISTS idx_birth_date ON members(birth_date)")
+        cursor.execute("CREATE INDEX IF NOT EXISTS idx_death_date ON members(death_date)")
+        cursor.execute("CREATE INDEX IF NOT EXISTS idx_job_congress ON job_positions(congress_number)")
+        cursor.execute("CREATE INDEX IF NOT EXISTS idx_job_party ON job_positions(party)")
+        cursor.execute("CREATE INDEX IF NOT EXISTS idx_job_region ON job_positions(region_code)")
+        cursor.execute("CREATE INDEX IF NOT EXISTS idx_job_type ON job_positions(job_name)")
+        conn.commit()
+        conn.close()
+        print("✓ Database schema created")
+    def extract_data_field(self, data: Dict[str, Any], key: str, default=None):
+        """Safely extract data from nested 'data' field if it exists."""
+        if 'data' in data:
+            return data['data'].get(key, default)
+        return data.get(key, default)
+    def ingest_profiles(self):
+        """Ingest all JSON profiles into SQLite database."""
+        conn = sqlite3.connect(self.db_path)
+        cursor = conn.cursor()
+        profile_files = list(self.data_dir.glob("*.json"))
+        total = len(profile_files)
+        print(f"Ingesting {total} profiles...")
+        for idx, profile_file in enumerate(profile_files, 1):
+            if idx % 1000 == 0:
+                print(f"  Processed {idx}/{total} profiles...")
+            try:
+                with open(profile_file, 'r', encoding='utf-8') as f:
+                    data = json.load(f)
+                # Handle nested 'data' structure
+                bio_id = self.extract_data_field(data, 'usCongressBioId')
+                if not bio_id:
+                    print(f"  Skipping {profile_file}: no bio_id found")
+                    continue
+                # Insert member data
+                cursor.execute("""
+                INSERT OR REPLACE INTO members (
+                    bio_id, family_name, given_name, middle_name, honorific_prefix,
+                    unaccented_family_name, unaccented_given_name, unaccented_middle_name,
+                    birth_date, birth_circa, death_date, death_circa, profile_text
+                ) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
+                """, (
+                    bio_id,
+                    self.extract_data_field(data, 'familyName'),
+                    self.extract_data_field(data, 'givenName'),
+                    self.extract_data_field(data, 'middleName'),
+                    self.extract_data_field(data, 'honorificPrefix'),
+                    self.extract_data_field(data, 'unaccentedFamilyName'),
+                    self.extract_data_field(data, 'unaccentedGivenName'),
+                    self.extract_data_field(data, 'unaccentedMiddleName'),
+                    self.extract_data_field(data, 'birthDate'),
+                    1 if self.extract_data_field(data, 'birthCirca') else 0,
+                    self.extract_data_field(data, 'deathDate'),
+                    1 if self.extract_data_field(data, 'deathCirca') else 0,
+                    self.extract_data_field(data, 'profileText')
+                ))
+                # Insert images
+                images = self.extract_data_field(data, 'image', [])
+                for img in images:
+                    cursor.execute("""
+                    INSERT INTO images (bio_id, content_url, caption)
+                    VALUES (?, ?, ?)
+                    """, (bio_id, img.get('contentUrl'), img.get('caption')))
+                # Insert job positions
+                job_positions = self.extract_data_field(data, 'jobPositions', [])
+                for job_pos in job_positions:
+                    job = job_pos.get('job', {})
+                    congress_aff = job_pos.get('congressAffiliation', {})
+                    congress = congress_aff.get('congress', {})
+                    party_list = congress_aff.get('partyAffiliation', [])
+                    caucus_list = congress_aff.get('caucusAffiliation', [])
+                    represents = congress_aff.get('represents', {})
+                    notes = congress_aff.get('note', [])
+                    note_text = notes[0].get('content') if notes else None
+                    cursor.execute("""
+                    INSERT INTO job_positions (
+                        bio_id, job_name, job_type, start_date, start_circa,
+                        end_date, end_circa, congress_number, congress_name,
+                        party, caucus, region_type, region_code, note
+                    ) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
+                    """, (
+                        bio_id,
+                        job.get('name'),
+                        job.get('jobType'),
+                        job_pos.get('startDate'),
+                        1 if job_pos.get('startCirca') else 0,
+                        job_pos.get('endDate'),
+                        1 if job_pos.get('endCirca') else 0,
+                        congress.get('congressNumber'),
+                        congress.get('name'),
+                        party_list[0].get('party', {}).get('name') if party_list else None,
+                        caucus_list[0].get('party', {}).get('name') if caucus_list else None,
+                        represents.get('regionType'),
+                        represents.get('regionCode'),
+                        note_text
+                    ))
+                # Insert relationships
+                relationships = self.extract_data_field(data, 'relationship', [])
+                for rel in relationships:
+                    related = rel.get('relatedTo', {})
+                    cursor.execute("""
+                    INSERT INTO relationships (bio_id, related_bio_id, relationship_type)
+                    VALUES (?, ?, ?)
+                    """, (bio_id, related.get('usCongressBioId'), rel.get('relationshipType')))
+                # Insert creative works
+                creative_works = self.extract_data_field(data, 'creativeWork', [])
+                for work in creative_works:
+                    cursor.execute("""
+                    INSERT INTO creative_works (bio_id, citation_text)
+                    VALUES (?, ?)
+                    """, (bio_id, work.get('freeFormCitationText')))
+                # Insert assets
+                assets = self.extract_data_field(data, 'asset', [])
+                for asset in assets:
+                    cursor.execute("""
+                    INSERT INTO assets (
+                        bio_id, name, asset_type, content_url, credit_line,
+                        accession_number, upload_date
+                    ) VALUES (?, ?, ?, ?, ?, ?, ?)
+                    """, (
+                        bio_id,
+                        asset.get('name'),
+                        asset.get('assetType'),
+                        asset.get('contentUrl'),
+                        asset.get('creditLine'),
+                        asset.get('accessionNumber'),
+                        asset.get('uploadDate')
+                    ))
+            except Exception as e:
+                print(f"  Error processing {profile_file}: {e}")
+                continue
+        conn.commit()
+        conn.close()
+        print(f"✓ Ingested {total} profiles into database")
+    def build_faiss_index(self):
+        """Build FAISS index for semantic search on profile biographies."""
+        print("\n" + "=" * 60)
+        print("BUILDING FAISS INDEX FOR SEMANTIC SEARCH")
+        print("=" * 60)
+        try:
+            # Load model
+            print("\n1. Loading sentence transformer model...")
+            start_time = time.time()
+            # Disable all parallelism to avoid Python 3.14 issues
+            os.environ['TOKENIZERS_PARALLELISM'] = 'false'
+            os.environ['OMP_NUM_THREADS'] = '1'
+            os.environ['MKL_NUM_THREADS'] = '1'
+            os.environ['OPENBLAS_NUM_THREADS'] = '1'
+            import torch
+            torch.set_num_threads(1)
+            self.model = SentenceTransformer('all-MiniLM-L6-v2')
+            print(f"   ✓ Model loaded in {time.time() - start_time:.3f}s")
+            # Load biographies from database
+            print("\n2. Loading biographies from database...")
+            start_time = time.time()
+            conn = sqlite3.connect(self.db_path)
+            cursor = conn.cursor()
+            cursor.execute("SELECT bio_id, profile_text FROM members WHERE profile_text IS NOT NULL")
+            profiles = cursor.fetchall()
+            conn.close()
+            print(f"   ✓ Loaded {len(profiles):,} biographies in {time.time() - start_time:.3f}s")
+            if len(profiles) == 0:
+                print("\n❌ ERROR: No profiles with text found in database!")
+                return False
+            # Prepare data
+            print("\n3. Preparing data for encoding...")
+            start_time = time.time()
+            bio_ids = [p[0] for p in profiles]
+            texts = [p[1] if p[1] else "" for p in profiles]
+            print(f"   ✓ Prepared {len(bio_ids):,} texts")
+            print(f"   ✓ Time: {time.time() - start_time:.3f}s")
+            # Generate embeddings in batches
+            print("\n4. Generating embeddings...")
+            start_time = time.time()
+            batch_size = 32
+            embeddings = []
+            for i in range(0, len(texts), batch_size):
+                batch = texts[i:i + batch_size]
+                batch_embeddings = self.model.encode(
+                    batch,
+                    show_progress_bar=False,
+                    convert_to_numpy=True,
+                    normalize_embeddings=False,
+                    device='cpu'  # Explicit CPU to avoid GPU issues
+                )
+                embeddings.extend(batch_embeddings)
+                # Progress update every 100 batches
+                if (i // batch_size + 1) % 100 == 0:
+                    elapsed = time.time() - start_time
+                    rate = (i + len(batch)) / elapsed
+                    print(f"   Encoded {i + len(batch):,}/{len(texts):,} ({rate:.0f} texts/sec)")
+            embeddings = np.array(embeddings, dtype=np.float32)
+            elapsed = time.time() - start_time
+            print(f"   ✓ Generated {len(embeddings):,} embeddings in {elapsed:.3f}s")
+            print(f"   ✓ Shape: {embeddings.shape}")
+            # Build FAISS index
+            print("\n5. Building FAISS index...")
+            start_time = time.time()
+            dimension = embeddings.shape[1]
+            print(f"   Dimension: {dimension}")
+            # Use IndexFlatIP for exact cosine similarity search
+            index = faiss.IndexFlatIP(dimension)
+            # Normalize embeddings for cosine similarity
+            faiss.normalize_L2(embeddings)
+            # Add to index
+            index.add(embeddings)
+            print(f"   ✓ Index built in {time.time() - start_time:.3f}s")
+            print(f"   ✓ Total vectors in index: {index.ntotal:,}")
+            # Save FAISS index
+            print("\n6. Saving FAISS index to disk...")
+            start_time = time.time()
+            faiss.write_index(index, "congress_faiss.index")
+            print(f"   ✓ Index saved to: congress_faiss.index")
+            print(f"   ✓ Time: {time.time() - start_time:.3f}s")
+            # Save note ID mapping
+            print("\n7. Saving bio ID mapping...")
+            start_time = time.time()
+            with open("congress_bio_ids.pkl", "wb") as f:
+                pickle.dump(bio_ids, f)
+            print(f"   ✓ Mapping saved to: congress_bio_ids.pkl")
+            print(f"   ✓ Time: {time.time() - start_time:.3f}s")
+            # Get file sizes
+            from pathlib import Path
+            index_size_mb = Path("congress_faiss.index").stat().st_size / (1024**2)
+            mapping_size_mb = Path("congress_bio_ids.pkl").stat().st_size / (1024**2)
+            print("\n" + "=" * 60)
+            print("FAISS INDEX BUILD COMPLETE")
+            print("=" * 60)
+            print(f"Total embeddings indexed: {len(bio_ids):,}")
+            print(f"Index file size: {index_size_mb:.2f} MB")
+            print(f"Mapping file size: {mapping_size_mb:.2f} MB")
+            print(f"Total size: {index_size_mb + mapping_size_mb:.2f} MB")
+            print("\nThe MCP server will load this index on startup for fast searches.")
+            return True
+        except Exception as e:
+            print(f"\n❌ ERROR building FAISS index: {e}")
+            print(f"   This may be due to Python 3.14 compatibility issues.")
+            print(f"   The database is still usable, but semantic search will not work.")
+            print(f"   Consider using Python 3.11 or 3.12 for full functionality.")
+            import traceback
+            traceback.print_exc()
+            return False
+    def run(self):
+        """Run the complete ingestion pipeline."""
+        print("Starting Congressional Bioguide ingestion...")
+        print("=" * 60)
+        try:
+            self.create_database_schema()
+            self.ingest_profiles()
+            faiss_success = self.build_faiss_index()
+            print("\n" + "=" * 60)
+            print("INGESTION COMPLETE")
+            print("=" * 60)
+            print(f"Database: {self.db_path}")
+            if faiss_success:
+                print(f"FAISS index: congress_faiss.index ✓")
+                print(f"ID mapping: congress_bio_ids.pkl ✓")
+                print("\nAll features available, including semantic search!")
+            else:
+                print(f"FAISS index: ❌ (failed to build)")
+                print("\nDatabase is ready, but semantic search is unavailable.")
+                print("All other MCP tools will work normally.")
+            return faiss_success
+        except Exception as e:
+            print(f"\n❌ FATAL ERROR: {e}")
+            import traceback
+            traceback.print_exc()
+            return False
+def main():
+    ingester = BioguideIngester()
+    ingester.run()
+if __name__ == "__main__":
+    main()

mcp_config_example.json ADDED Viewed

	@@ -0,0 +1,11 @@

+{
+  "mcpServers": {
+    "congressional-bioguide": {
+      "command": "/Users/electron/workspace/Nanocentury AI/NIO/BioGuideMCP/venv/bin/python",
+      "args": [
+        "/Users/electron/workspace/Nanocentury AI/NIO/BioGuideMCP/server.py"
+      ],
+      "cwd": "/Users/electron/workspace/Nanocentury AI/NIO/BioGuideMCP"
+    }
+  }
+}

requirements-minimal.txt ADDED Viewed

	@@ -0,0 +1,3 @@

+# Minimal requirements for database-only mode (no semantic search)
+# Works with any Python version including 3.14+
+mcp>=0.9.0

requirements.txt ADDED Viewed

	@@ -0,0 +1,7 @@

+# Requires Python 3.10-3.13 (NOT 3.14+ due to FAISS incompatibility)
+mcp>=1.0.0
+numpy>=1.24.0
+sentence-transformers>=2.2.0
+torch>=2.0.0
+faiss-cpu>=1.7.4
+gradio>=5.0.0

server.py ADDED Viewed

	@@ -0,0 +1,1219 @@

+#!/usr/bin/env python3
+"""
+MCP Server for Congressional Bioguide profiles.
+Provides SQL queries and semantic search capabilities.
+"""
+import sys
+import sqlite3
+import json
+import os
+import warnings
+from typing import List, Dict, Any, Optional
+import numpy as np
+from sentence_transformers import SentenceTransformer
+import faiss
+import pickle
+from pathlib import Path
+from mcp.server import Server
+from mcp.types import Tool, TextContent, ImageContent, EmbeddedResource
+import mcp.server.stdio
+# Suppress all warnings to prevent JSON protocol corruption
+warnings.filterwarnings('ignore')
+os.environ['TOKENIZERS_PARALLELISM'] = 'false'
+# Initialize global resources - use absolute paths
+SCRIPT_DIR = Path(__file__).parent.absolute()
+DB_PATH = str(SCRIPT_DIR / "congress.db")
+FAISS_INDEX_PATH = str(SCRIPT_DIR / "congress_faiss.index")
+BIO_IDS_PATH = str(SCRIPT_DIR / "congress_bio_ids.pkl")
+# Load FAISS index and model
+model = None
+faiss_index = None
+bio_id_mapping = None
+def initialize_search_index():
+    """Initialize the semantic search components."""
+    global model, faiss_index, bio_id_mapping
+    try:
+        if Path(FAISS_INDEX_PATH).exists() and Path(BIO_IDS_PATH).exists():
+            print(f"Loading FAISS index from: {FAISS_INDEX_PATH}", file=sys.stderr, flush=True)
+            model = SentenceTransformer('all-MiniLM-L6-v2')
+            faiss_index = faiss.read_index(FAISS_INDEX_PATH)
+            with open(BIO_IDS_PATH, "rb") as f:
+                bio_id_mapping = pickle.load(f)
+            print(f"✓ Loaded {faiss_index.ntotal} embeddings", file=sys.stderr, flush=True)
+            return True
+        else:
+            print(f"FAISS index not found at: {FAISS_INDEX_PATH}", file=sys.stderr, flush=True)
+            print(f"Bio IDs not found at: {BIO_IDS_PATH}", file=sys.stderr, flush=True)
+            return False
+    except Exception as e:
+        print(f"Error loading search index: {e}", file=sys.stderr, flush=True)
+        return False
+def get_db_connection():
+    """Get a database connection."""
+    return sqlite3.connect(DB_PATH)
+def execute_query(query: str, params: tuple = ()) -> List[Dict[str, Any]]:
+    """Execute a SQL query and return results as list of dicts."""
+    conn = get_db_connection()
+    conn.row_factory = sqlite3.Row
+    cursor = conn.cursor()
+    cursor.execute(query, params)
+    results = [dict(row) for row in cursor.fetchall()]
+    conn.close()
+    return results
+def format_member_concise(member: Dict[str, Any]) -> Dict[str, Any]:
+    """Format member data to concise output with only essential fields."""
+    return {
+        'bio_id': member.get('bio_id'),
+        'name': f"{member.get('given_name', '')} {member.get('middle_name', '') + ' ' if member.get('middle_name') else ''}{member.get('family_name', '')}".strip(),
+        'birth_date': member.get('birth_date'),
+        'death_date': member.get('death_date'),
+        'party': member.get('party'),
+        'state': member.get('region_code'),
+        'position': member.get('job_name'),
+        'congress': member.get('congress_number')
+    }
+def get_member_profile(bio_id: str) -> Optional[Dict[str, Any]]:
+    """Get complete profile for a member including all related data."""
+    conn = get_db_connection()
+    conn.row_factory = sqlite3.Row
+    cursor = conn.cursor()
+    # Get member data
+    cursor.execute("SELECT * FROM members WHERE bio_id = ?", (bio_id,))
+    member = cursor.fetchone()
+    if not member:
+        conn.close()
+        return None
+    profile = dict(member)
+    # Get images
+    cursor.execute("SELECT * FROM images WHERE bio_id = ?", (bio_id,))
+    profile['images'] = [dict(row) for row in cursor.fetchall()]
+    # Get job positions
+    cursor.execute("SELECT * FROM job_positions WHERE bio_id = ? ORDER BY start_date", (bio_id,))
+    profile['job_positions'] = [dict(row) for row in cursor.fetchall()]
+    # Get relationships
+    cursor.execute("SELECT * FROM relationships WHERE bio_id = ?", (bio_id,))
+    profile['relationships'] = [dict(row) for row in cursor.fetchall()]
+    # Get creative works
+    cursor.execute("SELECT * FROM creative_works WHERE bio_id = ?", (bio_id,))
+    profile['creative_works'] = [dict(row) for row in cursor.fetchall()]
+    # Get assets
+    cursor.execute("SELECT * FROM assets WHERE bio_id = ?", (bio_id,))
+    profile['assets'] = [dict(row) for row in cursor.fetchall()]
+    conn.close()
+    return profile
+def semantic_search(query_text: str, top_k: int = 10) -> List[str]:
+    """Perform semantic search and return matching bio_ids."""
+    if not all([model, faiss_index, bio_id_mapping]):
+        raise ValueError("Search index not initialized. Run ingest_data.py first.")
+    # Encode query
+    query_embedding = model.encode([query_text])[0].astype('float32')
+    query_embedding = query_embedding.reshape(1, -1)
+    # Normalize for cosine similarity
+    faiss.normalize_L2(query_embedding)
+    # Search
+    scores, indices = faiss_index.search(query_embedding, top_k)
+    # Map indices to bio_ids
+    results = []
+    for idx, score in zip(indices[0], scores[0]):
+        if idx < len(bio_id_mapping):
+            results.append({
+                'bio_id': bio_id_mapping[idx],
+                'similarity_score': float(score)
+            })
+    return results
+# Initialize MCP server
+server = Server("congressional-bioguide")
+@server.list_tools()
+async def list_tools() -> List[Tool]:
+    """List all available tools."""
+    return [
+        Tool(
+            name="search_by_name",
+            description="Search for Congressional members by name. Returns concise results (name, dates, party, congress) by default.",
+            inputSchema={
+                "type": "object",
+                "properties": {
+                    "family_name": {
+                        "type": "string",
+                        "description": "Family/last name to search for (partial match)"
+                    },
+                    "given_name": {
+                        "type": "string",
+                        "description": "Given/first name to search for (partial match)"
+                    },
+                    "full_name": {
+                        "type": "string",
+                        "description": "Full name to search for (partial match in any name field)"
+                    },
+                    "limit": {
+                        "type": "integer",
+                        "description": "Maximum results to return (default: 50)",
+                        "default": 50
+                    },
+                    "return_full_profile": {
+                        "type": "boolean",
+                        "description": "Return full profile data including biography (default: false)",
+                        "default": False
+                    }
+                }
+            }
+        ),
+        Tool(
+            name="search_by_party",
+            description="Search for Congressional members by political party affiliation.",
+            inputSchema={
+                "type": "object",
+                "properties": {
+                    "party": {
+                        "type": "string",
+                        "description": "Party name (e.g., 'Republican', 'Democrat', 'Whig')"
+                    },
+                    "congress_number": {
+                        "type": "integer",
+                        "description": "Optional: Filter by specific Congress number (e.g., 117)"
+                    }
+                },
+                "required": ["party"]
+            }
+        ),
+        Tool(
+            name="search_by_state",
+            description="Search for Congressional members by state or region they represented.",
+            inputSchema={
+                "type": "object",
+                "properties": {
+                    "state_code": {
+                        "type": "string",
+                        "description": "State code (e.g., 'CA', 'NY', 'TX')"
+                    },
+                    "congress_number": {
+                        "type": "integer",
+                        "description": "Optional: Filter by specific Congress number"
+                    }
+                },
+                "required": ["state_code"]
+            }
+        ),
+        Tool(
+            name="search_by_congress",
+            description="Get all members who served in a specific Congress.",
+            inputSchema={
+                "type": "object",
+                "properties": {
+                    "congress_number": {
+                        "type": "integer",
+                        "description": "Congress number (e.g., 117 for the 117th Congress)"
+                    },
+                    "chamber": {
+                        "type": "string",
+                        "description": "Optional: Filter by chamber ('Representative' or 'Senator')"
+                    }
+                },
+                "required": ["congress_number"]
+            }
+        ),
+        Tool(
+            name="search_by_date_range",
+            description="Search for members who served during a specific date range.",
+            inputSchema={
+                "type": "object",
+                "properties": {
+                    "start_date": {
+                        "type": "string",
+                        "description": "Start date in YYYY-MM-DD format"
+                    },
+                    "end_date": {
+                        "type": "string",
+                        "description": "End date in YYYY-MM-DD format"
+                    }
+                },
+                "required": ["start_date", "end_date"]
+            }
+        ),
+        Tool(
+            name="semantic_search_biography",
+            description="Perform semantic search on member biographies. Use natural language to find members based on career details, accomplishments, background, etc.",
+            inputSchema={
+                "type": "object",
+                "properties": {
+                    "query": {
+                        "type": "string",
+                        "description": "Natural language query to search biographies (e.g., 'lawyers who became judges', 'Civil War veterans')"
+                    },
+                    "top_k": {
+                        "type": "integer",
+                        "description": "Number of results to return (default: 10)",
+                        "default": 5
+                    }
+                },
+                "required": ["query"]
+            }
+        ),
+        Tool(
+            name="get_member_profile",
+            description="Get complete profile information for a specific member by their Bioguide ID.",
+            inputSchema={
+                "type": "object",
+                "properties": {
+                    "bio_id": {
+                        "type": "string",
+                        "description": "Bioguide ID (e.g., 'W000374', 'P000144')"
+                    }
+                },
+                "required": ["bio_id"]
+            }
+        ),
+        Tool(
+            name="execute_sql_query",
+            description="Execute a custom SQL query against the Congressional database. Use for complex queries not covered by other tools. READ-ONLY access.",
+            inputSchema={
+                "type": "object",
+                "properties": {
+                    "query": {
+                        "type": "string",
+                        "description": "SQL SELECT query to execute"
+                    }
+                },
+                "required": ["query"]
+            }
+        ),
+        Tool(
+            name="get_database_schema",
+            description="Get the database schema showing all tables and columns available for querying.",
+            inputSchema={
+                "type": "object",
+                "properties": {}
+            }
+        ),
+        Tool(
+            name="search_by_relationship",
+            description="Find members who have family relationships with other members (e.g., father, son, spouse).",
+            inputSchema={
+                "type": "object",
+                "properties": {
+                    "relationship_type": {
+                        "type": "string",
+                        "description": "Type of relationship (e.g., 'father', 'son', 'spouse', 'brother')"
+                    }
+                }
+            }
+        ),
+        Tool(
+            name="search_biography_regex",
+            description="Search member biographies using regex patterns. Returns concise member info (name, dates, party, state) for matches. Use filters to narrow results.",
+            inputSchema={
+                "type": "object",
+                "properties": {
+                    "pattern": {
+                        "type": "string",
+                        "description": "Regex pattern to search for in biographies (e.g., 'Harvard', 'lawyer', 'served.*army', 'born in [0-9]{4}')"
+                    },
+                    "case_sensitive": {
+                        "type": "boolean",
+                        "description": "Whether search should be case-sensitive (default: false)",
+                        "default": False
+                    },
+                    "limit": {
+                        "type": "integer",
+                        "description": "Maximum number of results to return (default: 5)",
+                        "default": 5
+                    },
+                    "filter_party": {
+                        "type": "string",
+                        "description": "Optional: Filter results by party (e.g., 'Republican', 'Democrat')"
+                    },
+                    "filter_state": {
+                        "type": "string",
+                        "description": "Optional: Filter results by state code (e.g., 'CA', 'NY')"
+                    },
+                    "filter_congress": {
+                        "type": "integer",
+                        "description": "Optional: Filter results by Congress number (e.g., 117)"
+                    },
+                    "return_full_profile": {
+                        "type": "boolean",
+                        "description": "Return full profile including biography text (default: false)",
+                        "default": False
+                    }
+                },
+                "required": ["pattern"]
+            }
+        ),
+        Tool(
+            name="count_members",
+            description="Count members matching specific criteria. Returns aggregated counts by party, state, position, or custom grouping. Much more efficient than returning full member lists.",
+            inputSchema={
+                "type": "object",
+                "properties": {
+                    "group_by": {
+                        "type": "string",
+                        "description": "Field to group by: 'party', 'state', 'position', 'congress', or 'year'",
+                        "enum": ["party", "state", "position", "congress", "year"]
+                    },
+                    "filter_party": {
+                        "type": "string",
+                        "description": "Optional: Filter by party name"
+                    },
+                    "filter_state": {
+                        "type": "string",
+                        "description": "Optional: Filter by state code"
+                    },
+                    "filter_congress": {
+                        "type": "integer",
+                        "description": "Optional: Filter by Congress number"
+                    },
+                    "filter_position": {
+                        "type": "string",
+                        "description": "Optional: Filter by position (Representative, Senator)"
+                    },
+                    "date_range_start": {
+                        "type": "string",
+                        "description": "Optional: Start date (YYYY-MM-DD)"
+                    },
+                    "date_range_end": {
+                        "type": "string",
+                        "description": "Optional: End date (YYYY-MM-DD)"
+                    }
+                },
+                "required": ["group_by"]
+            }
+        ),
+        Tool(
+            name="temporal_analysis",
+            description="Analyze member trends over time. Shows how membership changed across years, decades, or congresses. Perfect for historical analysis.",
+            inputSchema={
+                "type": "object",
+                "properties": {
+                    "analysis_type": {
+                        "type": "string",
+                        "description": "Type of temporal analysis",
+                        "enum": ["party_over_time", "state_representation", "position_counts", "demographics"]
+                    },
+                    "time_unit": {
+                        "type": "string",
+                        "description": "Time granularity: 'congress', 'year', 'decade'",
+                        "enum": ["congress", "year", "decade"],
+                        "default": "congress"
+                    },
+                    "start_date": {
+                        "type": "string",
+                        "description": "Optional: Start date (YYYY-MM-DD)"
+                    },
+                    "end_date": {
+                        "type": "string",
+                        "description": "Optional: End date (YYYY-MM-DD)"
+                    },
+                    "filter_party": {
+                        "type": "string",
+                        "description": "Optional: Filter to specific party"
+                    },
+                    "filter_state": {
+                        "type": "string",
+                        "description": "Optional: Filter to specific state"
+                    }
+                },
+                "required": ["analysis_type"]
+            }
+        ),
+        Tool(
+            name="count_by_biography_content",
+            description="Count members whose biographies mention specific keywords or phrases (e.g., 'Harvard', 'lawyer', 'Civil War'). Much more efficient than searching when you only need counts.",
+            inputSchema={
+                "type": "object",
+                "properties": {
+                    "keywords": {
+                        "type": "array",
+                        "items": {"type": "string"},
+                        "description": "List of keywords or phrases to search for (case-insensitive)"
+                    },
+                    "match_all": {
+                        "type": "boolean",
+                        "description": "If true, count members matching ALL keywords. If false, count members matching ANY keyword (default: false)",
+                        "default": False
+                    },
+                    "breakdown_by": {
+                        "type": "string",
+                        "description": "Optional: Break down counts by party, state, position, or congress",
+                        "enum": ["party", "state", "position", "congress", "none"],
+                        "default": "none"
+                    },
+                    "filter_party": {
+                        "type": "string",
+                        "description": "Optional: Only count members from specific party"
+                    },
+                    "filter_state": {
+                        "type": "string",
+                        "description": "Optional: Only count members from specific state"
+                    }
+                },
+                "required": ["keywords"]
+            }
+        )
+    ]
+@server.call_tool()
+async def call_tool(name: str, arguments: Any) -> List[TextContent]:
+    """Handle tool calls."""
+    try:
+        if name == "search_by_name":
+            family_name = arguments.get("family_name")
+            given_name = arguments.get("given_name")
+            full_name = arguments.get("full_name")
+            limit = arguments.get("limit", 50)
+            return_full = arguments.get("return_full_profile", False)
+            conditions = []
+            params = []
+            if family_name:
+                conditions.append("LOWER(m.unaccented_family_name) LIKE LOWER(?)")
+                params.append(f"%{family_name}%")
+            if given_name:
+                conditions.append("LOWER(m.unaccented_given_name) LIKE LOWER(?)")
+                params.append(f"%{given_name}%")
+            if full_name:
+                conditions.append("""(LOWER(m.unaccented_family_name) LIKE LOWER(?)
+                    OR LOWER(m.unaccented_given_name) LIKE LOWER(?)
+                    OR LOWER(m.unaccented_middle_name) LIKE LOWER(?))""")
+                params.extend([f"%{full_name}%"] * 3)
+            if not conditions:
+                return [TextContent(type="text", text="Please provide at least one name parameter.")]
+            if return_full:
+                query = f"SELECT * FROM members m WHERE {' AND '.join(conditions)} ORDER BY m.family_name, m.given_name LIMIT ?"
+                params.append(limit)
+                results = execute_query(query, tuple(params))
+            else:
+                # Return concise results with job info
+                query = f"""
+                SELECT DISTINCT m.bio_id, m.given_name, m.middle_name, m.family_name,
+                       m.birth_date, m.death_date,
+                       j.party, j.region_code, j.job_name, j.congress_number
+                FROM members m
+                LEFT JOIN job_positions j ON m.bio_id = j.bio_id
+                WHERE {' AND '.join(conditions)}
+                ORDER BY m.family_name, m.given_name
+                LIMIT ?
+                """
+                params.append(limit)
+                results = execute_query(query, tuple(params))
+                results = [format_member_concise(r) for r in results]
+            response = {
+                "count": len(results),
+                "limit": limit,
+                "results": results
+            }
+            return [TextContent(type="text", text=json.dumps(response, indent=2))]
+        elif name == "search_by_party":
+            party = arguments["party"]
+            congress_number = arguments.get("congress_number")
+            if congress_number:
+                query = """
+                SELECT DISTINCT m.* FROM members m
+                JOIN job_positions j ON m.bio_id = j.bio_id
+                WHERE j.party = ? AND j.congress_number = ?
+                ORDER BY m.family_name, m.given_name
+                """
+                results = execute_query(query, (party, congress_number))
+            else:
+                query = """
+                SELECT DISTINCT m.* FROM members m
+                JOIN job_positions j ON m.bio_id = j.bio_id
+                WHERE j.party = ?
+                ORDER BY m.family_name, m.given_name
+                """
+                results = execute_query(query, (party,))
+            return [TextContent(type="text", text=json.dumps(results, indent=2))]
+        elif name == "search_by_state":
+            state_code = arguments["state_code"].upper()
+            congress_number = arguments.get("congress_number")
+            if congress_number:
+                query = """
+                SELECT DISTINCT m.*, j.job_name, j.party, j.congress_number
+                FROM members m
+                JOIN job_positions j ON m.bio_id = j.bio_id
+                WHERE j.region_code = ? AND j.congress_number = ?
+                ORDER BY m.family_name, m.given_name
+                """
+                results = execute_query(query, (state_code, congress_number))
+            else:
+                query = """
+                SELECT DISTINCT m.*, j.job_name, j.party, j.congress_number
+                FROM members m
+                JOIN job_positions j ON m.bio_id = j.bio_id
+                WHERE j.region_code = ?
+                ORDER BY m.family_name, m.given_name
+                """
+                results = execute_query(query, (state_code,))
+            return [TextContent(type="text", text=json.dumps(results, indent=2))]
+        elif name == "search_by_congress":
+            congress_number = arguments["congress_number"]
+            chamber = arguments.get("chamber")
+            if chamber:
+                query = """
+                SELECT DISTINCT m.*, j.job_name, j.party, j.region_code
+                FROM members m
+                JOIN job_positions j ON m.bio_id = j.bio_id
+                WHERE j.congress_number = ? AND j.job_name = ?
+                ORDER BY m.family_name, m.given_name
+                """
+                results = execute_query(query, (congress_number, chamber))
+            else:
+                query = """
+                SELECT DISTINCT m.*, j.job_name, j.party, j.region_code
+                FROM members m
+                JOIN job_positions j ON m.bio_id = j.bio_id
+                WHERE j.congress_number = ?
+                ORDER BY m.family_name, m.given_name
+                """
+                results = execute_query(query, (congress_number,))
+            return [TextContent(type="text", text=json.dumps(results, indent=2))]
+        elif name == "search_by_date_range":
+            start_date = arguments["start_date"]
+            end_date = arguments["end_date"]
+            query = """
+            SELECT DISTINCT m.*, j.job_name, j.start_date, j.end_date
+            FROM members m
+            JOIN job_positions j ON m.bio_id = j.bio_id
+            WHERE (j.start_date <= ? AND (j.end_date >= ? OR j.end_date IS NULL))
+            ORDER BY j.start_date, m.family_name, m.given_name
+            """
+            results = execute_query(query, (end_date, start_date))
+            return [TextContent(type="text", text=json.dumps(results, indent=2))]
+        elif name == "semantic_search_biography":
+            query_text = arguments["query"]
+            top_k = arguments.get("top_k", 10)
+            # Perform semantic search
+            search_results = semantic_search(query_text, top_k)
+            # Get full profiles for top results
+            profiles = []
+            for result in search_results:
+                profile = get_member_profile(result['bio_id'])
+                if profile:
+                    profile['similarity_score'] = result['similarity_score']
+                    profiles.append(profile)
+            return [TextContent(type="text", text=json.dumps(profiles, indent=2))]
+        elif name == "get_member_profile":
+            bio_id = arguments["bio_id"]
+            profile = get_member_profile(bio_id)
+            if profile:
+                return [TextContent(type="text", text=json.dumps(profile, indent=2))]
+            else:
+                return [TextContent(type="text", text=f"No profile found for bio_id: {bio_id}")]
+        elif name == "execute_sql_query":
+            query = arguments["query"]
+            # Basic security: only allow SELECT queries
+            if not query.strip().upper().startswith("SELECT"):
+                return [TextContent(type="text", text="Error: Only SELECT queries are allowed.")]
+            results = execute_query(query)
+            return [TextContent(type="text", text=json.dumps(results, indent=2))]
+        elif name == "get_database_schema":
+            schema_info = {
+                "tables": {
+                    "members": {
+                        "description": "Main table with member biographical information",
+                        "columns": [
+                            "bio_id (PRIMARY KEY) - Bioguide ID",
+                            "family_name - Last name",
+                            "given_name - First name",
+                            "middle_name - Middle name",
+                            "honorific_prefix - Title (Mr., Mrs., etc.)",
+                            "unaccented_family_name - Family name without accents",
+                            "unaccented_given_name - Given name without accents",
+                            "unaccented_middle_name - Middle name without accents",
+                            "birth_date - Birth date (YYYY-MM-DD)",
+                            "birth_circa - Whether birth date is approximate (0/1)",
+                            "death_date - Death date (YYYY-MM-DD)",
+                            "death_circa - Whether death date is approximate (0/1)",
+                            "profile_text - Full biography text",
+                            "full_name - Generated full name column"
+                        ]
+                    },
+                    "job_positions": {
+                        "description": "Congressional positions held by members",
+                        "columns": [
+                            "id (PRIMARY KEY)",
+                            "bio_id (FOREIGN KEY) - References members",
+                            "job_name - Position title (Representative, Senator)",
+                            "job_type - Type of position",
+                            "start_date - Start date of position",
+                            "start_circa - Whether start date is approximate (0/1)",
+                            "end_date - End date of position",
+                            "end_circa - Whether end date is approximate (0/1)",
+                            "congress_number - Congress number (e.g., 117)",
+                            "congress_name - Full Congress name",
+                            "party - Party affiliation",
+                            "caucus - Caucus affiliation",
+                            "region_type - Type of region represented",
+                            "region_code - State/region code (e.g., 'CA', 'NY')",
+                            "note - Additional notes"
+                        ]
+                    },
+                    "images": {
+                        "description": "Profile images",
+                        "columns": ["id", "bio_id", "content_url", "caption"]
+                    },
+                    "relationships": {
+                        "description": "Family relationships between members",
+                        "columns": ["id", "bio_id", "related_bio_id", "relationship_type"]
+                    },
+                    "creative_works": {
+                        "description": "Publications and creative works by members",
+                        "columns": ["id", "bio_id", "citation_text"]
+                    },
+                    "assets": {
+                        "description": "Additional assets (images, documents)",
+                        "columns": ["id", "bio_id", "name", "asset_type", "content_url",
+                                   "credit_line", "accession_number", "upload_date"]
+                    }
+                },
+                "indexes": [
+                    "idx_family_name - Index on unaccented_family_name",
+                    "idx_given_name - Index on unaccented_given_name",
+                    "idx_birth_date - Index on birth_date",
+                    "idx_death_date - Index on death_date",
+                    "idx_job_congress - Index on congress_number",
+                    "idx_job_party - Index on party",
+                    "idx_job_region - Index on region_code",
+                    "idx_job_type - Index on job_name"
+                ]
+            }
+            return [TextContent(type="text", text=json.dumps(schema_info, indent=2))]
+        elif name == "search_by_relationship":
+            relationship_type = arguments.get("relationship_type")
+            if relationship_type:
+                query = """
+                SELECT m1.bio_id, m1.family_name, m1.given_name,
+                       r.relationship_type, r.related_bio_id,
+                       m2.family_name as related_family_name,
+                       m2.given_name as related_given_name
+                FROM members m1
+                JOIN relationships r ON m1.bio_id = r.bio_id
+                JOIN members m2 ON r.related_bio_id = m2.bio_id
+                WHERE r.relationship_type = ?
+                ORDER BY m1.family_name, m1.given_name
+                """
+                results = execute_query(query, (relationship_type,))
+            else:
+                query = """
+                SELECT m1.bio_id, m1.family_name, m1.given_name,
+                       r.relationship_type, r.related_bio_id,
+                       m2.family_name as related_family_name,
+                       m2.given_name as related_given_name
+                FROM members m1
+                JOIN relationships r ON m1.bio_id = r.bio_id
+                JOIN members m2 ON r.related_bio_id = m2.bio_id
+                ORDER BY m1.family_name, m1.given_name
+                """
+                results = execute_query(query)
+            return [TextContent(type="text", text=json.dumps(results, indent=2))]
+        elif name == "search_biography_regex":
+            import re
+            pattern = arguments["pattern"]
+            case_sensitive = arguments.get("case_sensitive", False)
+            limit = arguments.get("limit", 5)
+            filter_party = arguments.get("filter_party")
+            filter_state = arguments.get("filter_state")
+            filter_congress = arguments.get("filter_congress")
+            return_full = arguments.get("return_full_profile", False)
+            try:
+                # Compile regex pattern
+                flags = 0 if case_sensitive else re.IGNORECASE
+                regex = re.compile(pattern, flags)
+                # Build query with optional filters
+                conn = get_db_connection()
+                conn.row_factory = sqlite3.Row
+                cursor = conn.cursor()
+                # Base query - join with job_positions for filtering
+                query = """
+                    SELECT DISTINCT m.bio_id, m.family_name, m.given_name, m.middle_name,
+                           m.birth_date, m.death_date, m.profile_text,
+                           j.party, j.region_code, j.job_name, j.congress_number
+                    FROM members m
+                    LEFT JOIN job_positions j ON m.bio_id = j.bio_id
+                    WHERE m.profile_text IS NOT NULL
+                """
+                where_conditions = []
+                params = []
+                if filter_party:
+                    where_conditions.append("j.party = ?")
+                    params.append(filter_party)
+                if filter_state:
+                    where_conditions.append("j.region_code = ?")
+                    params.append(filter_state)
+                if filter_congress:
+                    where_conditions.append("j.congress_number = ?")
+                    params.append(filter_congress)
+                if where_conditions:
+                    query += " AND " + " AND ".join(where_conditions)
+                cursor.execute(query, tuple(params))
+                # Filter using regex
+                matches = []
+                for row in cursor:
+                    if regex.search(row['profile_text']):
+                        if return_full:
+                            # Return full profile
+                            matches.append(dict(row))
+                        else:
+                            # Return concise info only
+                            match_result = {
+                                "bio_id": row['bio_id'],
+                                "name": f"{row['given_name']} {row['middle_name'] or ''} {row['family_name']}".strip(),
+                                "birth_date": row['birth_date'],
+                                "death_date": row['death_date'],
+                                "party": row['party'],
+                                "state": row['region_code'],
+                                "position": row['job_name'],
+                                "congress": row['congress_number']
+                            }
+                            matches.append(match_result)
+                        if len(matches) >= limit:
+                            break
+                conn.close()
+                result = {
+                    "pattern": pattern,
+                    "case_sensitive": case_sensitive,
+                    "total_members_found": len(matches),
+                    "limit": limit,
+                    "filters_applied": {
+                        "party": filter_party,
+                        "state": filter_state,
+                        "congress": filter_congress
+                    },
+                    "results": matches
+                }
+                return [TextContent(type="text", text=json.dumps(result, indent=2))]
+            except re.error as e:
+                return [TextContent(type="text", text=f"Invalid regex pattern: {str(e)}")]
+        elif name == "count_members":
+            group_by = arguments["group_by"]
+            filter_party = arguments.get("filter_party")
+            filter_state = arguments.get("filter_state")
+            filter_congress = arguments.get("filter_congress")
+            filter_position = arguments.get("filter_position")
+            date_start = arguments.get("date_range_start")
+            date_end = arguments.get("date_range_end")
+            # Build WHERE clause
+            where_conditions = []
+            params = []
+            if filter_party:
+                where_conditions.append("j.party = ?")
+                params.append(filter_party)
+            if filter_state:
+                where_conditions.append("j.region_code = ?")
+                params.append(filter_state)
+            if filter_congress:
+                where_conditions.append("j.congress_number = ?")
+                params.append(filter_congress)
+            if filter_position:
+                where_conditions.append("j.job_name = ?")
+                params.append(filter_position)
+            if date_start and date_end:
+                where_conditions.append("(j.start_date <= ? AND (j.end_date >= ? OR j.end_date IS NULL))")
+                params.extend([date_end, date_start])
+            where_clause = "WHERE " + " AND ".join(where_conditions) if where_conditions else ""
+            # Build GROUP BY query
+            if group_by == "party":
+                query = f"""
+                SELECT j.party as group_key, COUNT(DISTINCT m.bio_id) as count
+                FROM members m
+                JOIN job_positions j ON m.bio_id = j.bio_id
+                {where_clause}
+                GROUP BY j.party
+                ORDER BY count DESC
+                """
+            elif group_by == "state":
+                query = f"""
+                SELECT j.region_code as group_key, COUNT(DISTINCT m.bio_id) as count
+                FROM members m
+                JOIN job_positions j ON m.bio_id = j.bio_id
+                {where_clause}
+                GROUP BY j.region_code
+                ORDER BY count DESC
+                """
+            elif group_by == "position":
+                query = f"""
+                SELECT j.job_name as group_key, COUNT(DISTINCT m.bio_id) as count
+                FROM members m
+                JOIN job_positions j ON m.bio_id = j.bio_id
+                {where_clause}
+                GROUP BY j.job_name
+                ORDER BY count DESC
+                """
+            elif group_by == "congress":
+                query = f"""
+                SELECT j.congress_number as group_key, COUNT(DISTINCT m.bio_id) as count
+                FROM members m
+                JOIN job_positions j ON m.bio_id = j.bio_id
+                {where_clause}
+                GROUP BY j.congress_number
+                ORDER BY j.congress_number
+                """
+            elif group_by == "year":
+                query = f"""
+                SELECT SUBSTR(j.start_date, 1, 4) as group_key, COUNT(DISTINCT m.bio_id) as count
+                FROM members m
+                JOIN job_positions j ON m.bio_id = j.bio_id
+                {where_clause}
+                GROUP BY SUBSTR(j.start_date, 1, 4)
+                ORDER BY group_key
+                """
+            results = execute_query(query, tuple(params))
+            total = sum(r['count'] for r in results)
+            response = {
+                "group_by": group_by,
+                "total_unique_members": total,
+                "groups": results,
+                "filters_applied": {
+                    "party": filter_party,
+                    "state": filter_state,
+                    "congress": filter_congress,
+                    "position": filter_position,
+                    "date_range": [date_start, date_end] if date_start and date_end else None
+                }
+            }
+            return [TextContent(type="text", text=json.dumps(response, indent=2))]
+        elif name == "temporal_analysis":
+            analysis_type = arguments["analysis_type"]
+            time_unit = arguments.get("time_unit", "congress")
+            start_date = arguments.get("start_date")
+            end_date = arguments.get("end_date")
+            filter_party = arguments.get("filter_party")
+            filter_state = arguments.get("filter_state")
+            # Build WHERE clause
+            where_conditions = []
+            params = []
+            if start_date:
+                where_conditions.append("j.start_date >= ?")
+                params.append(start_date)
+            if end_date:
+                where_conditions.append("j.start_date <= ?")
+                params.append(end_date)
+            if filter_party:
+                where_conditions.append("j.party = ?")
+                params.append(filter_party)
+            if filter_state:
+                where_conditions.append("j.region_code = ?")
+                params.append(filter_state)
+            where_clause = "WHERE " + " AND ".join(where_conditions) if where_conditions else ""
+            if analysis_type == "party_over_time":
+                if time_unit == "congress":
+                    query = f"""
+                    SELECT j.congress_number, j.party, COUNT(DISTINCT m.bio_id) as count
+                    FROM members m
+                    JOIN job_positions j ON m.bio_id = j.bio_id
+                    {where_clause}
+                    GROUP BY j.congress_number, j.party
+                    ORDER BY j.congress_number, j.party
+                    """
+                elif time_unit == "year":
+                    query = f"""
+                    SELECT SUBSTR(j.start_date, 1, 4) as year, j.party, COUNT(DISTINCT m.bio_id) as count
+                    FROM members m
+                    JOIN job_positions j ON m.bio_id = j.bio_id
+                    {where_clause}
+                    GROUP BY year, j.party
+                    ORDER BY year, j.party
+                    """
+                elif time_unit == "decade":
+                    query = f"""
+                    SELECT (CAST(SUBSTR(j.start_date, 1, 4) AS INTEGER) / 10) * 10 as decade,
+                           j.party, COUNT(DISTINCT m.bio_id) as count
+                    FROM members m
+                    JOIN job_positions j ON m.bio_id = j.bio_id
+                    {where_clause}
+                    GROUP BY decade, j.party
+                    ORDER BY decade, j.party
+                    """
+            elif analysis_type == "state_representation":
+                if time_unit == "congress":
+                    query = f"""
+                    SELECT j.congress_number, j.region_code, COUNT(DISTINCT m.bio_id) as count
+                    FROM members m
+                    JOIN job_positions j ON m.bio_id = j.bio_id
+                    {where_clause}
+                    GROUP BY j.congress_number, j.region_code
+                    ORDER BY j.congress_number, count DESC
+                    """
+                else:
+                    query = f"""
+                    SELECT SUBSTR(j.start_date, 1, 4) as year, j.region_code, COUNT(DISTINCT m.bio_id) as count
+                    FROM members m
+                    JOIN job_positions j ON m.bio_id = j.bio_id
+                    {where_clause}
+                    GROUP BY year, j.region_code
+                    ORDER BY year, count DESC
+                    """
+            elif analysis_type == "position_counts":
+                query = f"""
+                SELECT j.congress_number, j.job_name, COUNT(DISTINCT m.bio_id) as count
+                FROM members m
+                JOIN job_positions j ON m.bio_id = j.bio_id
+                {where_clause}
+                GROUP BY j.congress_number, j.job_name
+                ORDER BY j.congress_number
+                """
+            elif analysis_type == "demographics":
+                # Analyze birth year distribution over time
+                if time_unit == "congress":
+                    query = f"""
+                    SELECT j.congress_number,
+                           AVG(CAST(SUBSTR(m.birth_date, 1, 4) AS INTEGER)) as avg_birth_year,
+                           COUNT(DISTINCT m.bio_id) as count
+                    FROM members m
+                    JOIN job_positions j ON m.bio_id = j.bio_id
+                    {where_clause}
+                    GROUP BY j.congress_number
+                    ORDER BY j.congress_number
+                    """
+                else:
+                    query = f"""
+                    SELECT SUBSTR(j.start_date, 1, 4) as year,
+                           AVG(CAST(SUBSTR(m.birth_date, 1, 4) AS INTEGER)) as avg_birth_year,
+                           COUNT(DISTINCT m.bio_id) as count
+                    FROM members m
+                    JOIN job_positions j ON m.bio_id = j.bio_id
+                    {where_clause}
+                    GROUP BY year
+                    ORDER BY year
+                    """
+            results = execute_query(query, tuple(params))
+            response = {
+                "analysis_type": analysis_type,
+                "time_unit": time_unit,
+                "data_points": len(results),
+                "results": results,
+                "filters_applied": {
+                    "start_date": start_date,
+                    "end_date": end_date,
+                    "party": filter_party,
+                    "state": filter_state
+                }
+            }
+            return [TextContent(type="text", text=json.dumps(response, indent=2))]
+        elif name == "count_by_biography_content":
+            keywords = arguments["keywords"]
+            match_all = arguments.get("match_all", False)
+            breakdown_by = arguments.get("breakdown_by", "none")
+            filter_party = arguments.get("filter_party")
+            filter_state = arguments.get("filter_state")
+            # Build the query to find matching members
+            conn = get_db_connection()
+            conn.row_factory = sqlite3.Row
+            cursor = conn.cursor()
+            # Get all members with their job info
+            base_query = """
+                SELECT DISTINCT m.bio_id, m.profile_text,
+                       j.party, j.region_code, j.job_name, j.congress_number
+                FROM members m
+                LEFT JOIN job_positions j ON m.bio_id = j.bio_id
+                WHERE m.profile_text IS NOT NULL
+            """
+            where_conditions = []
+            params = []
+            if filter_party:
+                where_conditions.append("j.party = ?")
+                params.append(filter_party)
+            if filter_state:
+                where_conditions.append("j.region_code = ?")
+                params.append(filter_state)
+            if where_conditions:
+                base_query += " AND " + " AND ".join(where_conditions)
+            cursor.execute(base_query, tuple(params))
+            all_members = cursor.fetchall()
+            # Filter members by keywords
+            matching_members = []
+            for member in all_members:
+                profile_text_lower = member['profile_text'].lower() if member['profile_text'] else ""
+                if match_all:
+                    # ALL keywords must be present
+                    if all(keyword.lower() in profile_text_lower for keyword in keywords):
+                        matching_members.append(dict(member))
+                else:
+                    # ANY keyword must be present
+                    if any(keyword.lower() in profile_text_lower for keyword in keywords):
+                        matching_members.append(dict(member))
+            conn.close()
+            # Count total unique members
+            unique_bio_ids = set(m['bio_id'] for m in matching_members)
+            total_count = len(unique_bio_ids)
+            # Breakdown if requested
+            breakdown = None
+            if breakdown_by != "none" and matching_members:
+                breakdown_counts = {}
+                for member in matching_members:
+                    if breakdown_by == "party":
+                        key = member.get('party', 'Unknown')
+                    elif breakdown_by == "state":
+                        key = member.get('region_code', 'Unknown')
+                    elif breakdown_by == "position":
+                        key = member.get('job_name', 'Unknown')
+                    elif breakdown_by == "congress":
+                        key = member.get('congress_number', 'Unknown')
+                    else:
+                        key = 'Unknown'
+                    if key not in breakdown_counts:
+                        breakdown_counts[key] = set()
+                    breakdown_counts[key].add(member['bio_id'])
+                # Convert sets to counts
+                breakdown = [
+                    {"group": k, "count": len(v)}
+                    for k, v in sorted(breakdown_counts.items(), key=lambda x: len(x[1]), reverse=True)
+                ]
+            response = {
+                "keywords": keywords,
+                "match_all": match_all,
+                "total_members_matching": total_count,
+                "breakdown_by": breakdown_by,
+                "breakdown": breakdown,
+                "filters_applied": {
+                    "party": filter_party,
+                    "state": filter_state
+                }
+            }
+            return [TextContent(type="text", text=json.dumps(response, indent=2))]
+        else:
+            return [TextContent(type="text", text=f"Unknown tool: {name}")]
+    except Exception as e:
+        return [TextContent(type="text", text=f"Error executing tool {name}: {str(e)}")]
+async def main():
+    """Main entry point for the MCP server."""
+    # Initialize search index (log to stderr to not interfere with stdio JSON protocol)
+    if initialize_search_index():
+        print("Search index loaded successfully", file=sys.stderr, flush=True)
+    else:
+        print("Warning: Search index not found. Run ingest_data.py to create it.", file=sys.stderr, flush=True)
+    # Run the server
+    async with mcp.server.stdio.stdio_server() as (read_stream, write_stream):
+        await server.run(
+            read_stream,
+            write_stream,
+            server.create_initialization_options()
+        )
+if __name__ == "__main__":
+    import asyncio
+    asyncio.run(main())

setup.sh ADDED Viewed

	@@ -0,0 +1,86 @@

+#!/bin/bash
+# Setup script for Congressional Bioguide MCP Server
+set -e
+echo "Setting up Congressional Bioguide MCP Server..."
+echo "=============================================="
+# Check for compatible Python versions
+PYTHON_CMD=""
+# Try to find a compatible Python version (3.10-3.13)
+for version in python3.13 python3.12 python3.11 python3.10; do
+    if command -v $version &> /dev/null; then
+        PYTHON_CMD=$version
+        echo "✓ Found compatible Python: $($PYTHON_CMD --version)"
+        break
+    fi
+done
+# Fall back to python3 if no specific version found
+if [ -z "$PYTHON_CMD" ]; then
+    if command -v python3 &> /dev/null; then
+        PYTHON_CMD=python3
+        PYTHON_VERSION=$($PYTHON_CMD --version 2>&1 | awk '{print $2}')
+        MAJOR=$(echo $PYTHON_VERSION | cut -d. -f1)
+        MINOR=$(echo $PYTHON_VERSION | cut -d. -f2)
+        echo "⚠️  Found Python $PYTHON_VERSION"
+        if [ "$MAJOR" -eq 3 ] && [ "$MINOR" -ge 14 ]; then
+            echo ""
+            echo "ERROR: Python 3.14+ is not compatible with FAISS library"
+            echo ""
+            echo "Please install Python 3.13 or 3.12 using pyenv:"
+            echo "  brew install pyenv"
+            echo "  pyenv install 3.13"
+            echo "  pyenv local 3.13"
+            echo "  ./setup.sh"
+            echo ""
+            exit 1
+        elif [ "$MAJOR" -eq 3 ] && [ "$MINOR" -lt 10 ]; then
+            echo "ERROR: Python 3.10 or higher required (found $PYTHON_VERSION)"
+            exit 1
+        fi
+    else
+        echo "ERROR: Python 3 not found"
+        exit 1
+    fi
+fi
+# Create virtual environment if it doesn't exist
+if [ ! -d "venv" ]; then
+    echo "Creating virtual environment with $PYTHON_CMD..."
+    $PYTHON_CMD -m venv venv
+    echo "✓ Virtual environment created"
+else
+    echo "✓ Virtual environment already exists"
+fi
+# Activate virtual environment
+source venv/bin/activate
+# Verify we're using the venv python
+echo "Using Python: $(which python3)"
+echo "Version: $(python3 --version)"
+# Install dependencies
+echo ""
+echo "Installing dependencies..."
+pip install --upgrade pip
+pip install -r requirements.txt
+echo "✓ Dependencies installed"
+# Run ingestion
+echo ""
+echo "Running data ingestion..."
+python3 ingest_data.py
+echo ""
+echo "=============================================="
+echo "✓ Setup complete!"
+echo ""
+echo "To run the server:"
+echo "  source venv/bin/activate"
+echo "  python3 server.py"

test_embeddings_data.py ADDED Viewed

	@@ -0,0 +1,143 @@

+#!/usr/bin/env python3
+"""
+Test the embeddings data to check for issues before FAISS operations.
+"""
+import sys
+import os
+import sqlite3
+import numpy as np
+print("=" * 60)
+print("EMBEDDINGS DATA VALIDATION TEST")
+print("=" * 60)
+print(f"Python version: {sys.version}")
+print()
+# Load model
+print("Loading sentence transformer...")
+os.environ['TOKENIZERS_PARALLELISM'] = 'false'
+from sentence_transformers import SentenceTransformer
+model = SentenceTransformer('all-MiniLM-L6-v2')
+print("✓ Model loaded\n")
+# Load ALL biographies
+print("Loading ALL biographies from database...")
+conn = sqlite3.connect("congress.db")
+cursor = conn.cursor()
+cursor.execute("""
+    SELECT bio_id, profile_text
+    FROM members
+    WHERE profile_text IS NOT NULL AND profile_text != ''
+""")
+rows = cursor.fetchall()
+conn.close()
+bio_ids = [r[0] for r in rows]
+texts = [r[1] for r in rows]
+print(f"✓ Loaded {len(texts)} biographies\n")
+# Encode ALL
+print("Encoding all biographies...")
+print("(This will take a few minutes...)")
+embeddings = []
+batch_size = 32
+for i in range(0, len(texts), batch_size):
+    batch = texts[i:i + batch_size]
+    batch_embeddings = model.encode(
+        batch,
+        show_progress_bar=False,
+        convert_to_numpy=True,
+        normalize_embeddings=False,
+        device='cpu'
+    )
+    embeddings.extend(batch_embeddings)
+    if (i // batch_size + 1) % 100 == 0:
+        print(f"  Encoded {i + len(batch)}/{len(texts)}...")
+embeddings = np.array(embeddings, dtype=np.float32)
+print(f"✓ Encoded all, shape: {embeddings.shape}\n")
+# Validate embeddings
+print("Validating embeddings data...")
+print(f"  Shape: {embeddings.shape}")
+print(f"  Dtype: {embeddings.dtype}")
+print(f"  Min value: {np.min(embeddings)}")
+print(f"  Max value: {np.max(embeddings)}")
+print(f"  Mean: {np.mean(embeddings)}")
+print(f"  Has NaN: {np.any(np.isnan(embeddings))}")
+print(f"  Has Inf: {np.any(np.isinf(embeddings))}")
+print(f"  Is C-contiguous: {embeddings.flags['C_CONTIGUOUS']}")
+print(f"  Memory usage: {embeddings.nbytes / (1024**2):.2f} MB")
+if np.any(np.isnan(embeddings)):
+    print("\n❌ ERROR: Embeddings contain NaN values!")
+    sys.exit(1)
+if np.any(np.isinf(embeddings)):
+    print("\n❌ ERROR: Embeddings contain Inf values!")
+    sys.exit(1)
+print("\n✓ Embeddings data looks good")
+# Now test FAISS operations one by one
+print("\n" + "=" * 60)
+print("Testing FAISS operations...")
+print("=" * 60)
+import faiss
+dimension = embeddings.shape[1]
+print(f"\n1. Creating IndexFlatIP with dimension {dimension}...")
+try:
+    index = faiss.IndexFlatIP(dimension)
+    print("   ✓ Index created")
+except Exception as e:
+    print(f"   ❌ FAILED at index creation: {e}")
+    import traceback
+    traceback.print_exc()
+    sys.exit(1)
+print(f"\n2. Normalizing {len(embeddings)} embeddings...")
+try:
+    # Make a copy to preserve original
+    embeddings_norm = embeddings.copy()
+    print(f"   Before normalize - sample norm: {np.linalg.norm(embeddings_norm[0]):.4f}")
+    faiss.normalize_L2(embeddings_norm)
+    print(f"   After normalize - sample norm: {np.linalg.norm(embeddings_norm[0]):.4f}")
+    print(f"   ✓ Normalized")
+except Exception as e:
+    print(f"   ❌ FAILED at normalize: {e}")
+    import traceback
+    traceback.print_exc()
+    sys.exit(1)
+print(f"\n3. Adding {len(embeddings_norm)} vectors to index...")
+try:
+    index.add(embeddings_norm)
+    print(f"   ✓ Added {index.ntotal} vectors")
+except Exception as e:
+    print(f"   ❌ FAILED at add: {e}")
+    import traceback
+    traceback.print_exc()
+    sys.exit(1)
+print(f"\n4. Writing index to disk...")
+try:
+    faiss.write_index(index, "test_full.faiss")
+    print(f"   ✓ Index written")
+except Exception as e:
+    print(f"   ❌ FAILED at write: {e}")
+    import traceback
+    traceback.print_exc()
+    sys.exit(1)
+print("\n" + "=" * 60)
+print("✅ SUCCESS! Full pipeline works!")
+print("=" * 60)
+print(f"\nProcessed {len(embeddings)} embeddings successfully")
+print("The index has been created: test_full.faiss")

test_faiss_minimal.py ADDED Viewed

	@@ -0,0 +1,142 @@

+#!/usr/bin/env python3
+"""
+Minimal FAISS test to isolate segfault issue.
+Tests each step individually to find the exact failure point.
+"""
+import sys
+import numpy as np
+print("=" * 60)
+print("MINIMAL FAISS TEST - Step by step debugging")
+print("=" * 60)
+print(f"Python version: {sys.version}")
+print()
+# Test 1: Import numpy
+print("Test 1: Import numpy...")
+try:
+    import numpy as np
+    print(f"  ✓ numpy imported successfully (version {np.__version__})")
+except Exception as e:
+    print(f"  ❌ Failed: {e}")
+    sys.exit(1)
+# Test 2: Import faiss
+print("\nTest 2: Import faiss...")
+try:
+    import faiss
+    print(f"  ✓ faiss imported successfully")
+except Exception as e:
+    print(f"  ❌ Failed: {e}")
+    sys.exit(1)
+# Test 3: Create simple numpy array
+print("\nTest 3: Create numpy array...")
+try:
+    test_data = np.random.rand(10, 128).astype('float32')
+    print(f"  ✓ Created array with shape {test_data.shape}")
+except Exception as e:
+    print(f"  ❌ Failed: {e}")
+    sys.exit(1)
+# Test 4: Create FAISS index
+print("\nTest 4: Create FAISS index...")
+try:
+    dimension = 128
+    index = faiss.IndexFlatL2(dimension)
+    print(f"  ✓ Created IndexFlatL2 with dimension {dimension}")
+except Exception as e:
+    print(f"  ❌ Failed: {e}")
+    sys.exit(1)
+# Test 5: Add vectors to index
+print("\nTest 5: Add vectors to FAISS index...")
+try:
+    index.add(test_data)
+    print(f"  ✓ Added {index.ntotal} vectors to index")
+except Exception as e:
+    print(f"  ❌ Failed: {e}")
+    sys.exit(1)
+# Test 6: Search index
+print("\nTest 6: Search FAISS index...")
+try:
+    query = np.random.rand(1, 128).astype('float32')
+    distances, indices = index.search(query, 5)
+    print(f"  ✓ Search completed, found {len(indices[0])} results")
+except Exception as e:
+    print(f"  ❌ Failed: {e}")
+    sys.exit(1)
+# Test 7: Test with IndexFlatIP (what we actually use)
+print("\nTest 7: Create IndexFlatIP...")
+try:
+    index_ip = faiss.IndexFlatIP(dimension)
+    print(f"  ✓ Created IndexFlatIP")
+except Exception as e:
+    print(f"  ❌ Failed: {e}")
+    sys.exit(1)
+# Test 8: Normalize vectors (critical step)
+print("\nTest 8: Normalize vectors with faiss.normalize_L2...")
+try:
+    test_data_copy = test_data.copy()
+    faiss.normalize_L2(test_data_copy)
+    print(f"  ✓ Normalized vectors")
+except Exception as e:
+    print(f"  ❌ Failed: {e}")
+    import traceback
+    traceback.print_exc()
+    sys.exit(1)
+# Test 9: Add normalized vectors to IndexFlatIP
+print("\nTest 9: Add normalized vectors to IndexFlatIP...")
+try:
+    index_ip.add(test_data_copy)
+    print(f"  ✓ Added {index_ip.ntotal} normalized vectors")
+except Exception as e:
+    print(f"  ❌ Failed: {e}")
+    import traceback
+    traceback.print_exc()
+    sys.exit(1)
+# Test 10: Write index to disk
+print("\nTest 10: Write index to disk...")
+try:
+    faiss.write_index(index_ip, "test_index.faiss")
+    print(f"  ✓ Index written to test_index.faiss")
+except Exception as e:
+    print(f"  ❌ Failed: {e}")
+    import traceback
+    traceback.print_exc()
+    sys.exit(1)
+# Test 11: Read index from disk
+print("\nTest 11: Read index from disk...")
+try:
+    loaded_index = faiss.read_index("test_index.faiss")
+    print(f"  ✓ Index loaded, contains {loaded_index.ntotal} vectors")
+except Exception as e:
+    print(f"  ❌ Failed: {e}")
+    import traceback
+    traceback.print_exc()
+    sys.exit(1)
+# Clean up
+print("\nTest 12: Clean up test file...")
+try:
+    import os
+    os.remove("test_index.faiss")
+    print(f"  ✓ Test file removed")
+except Exception as e:
+    print(f"  ⚠️  Could not remove test file: {e}")
+print("\n" + "=" * 60)
+print("✅ ALL TESTS PASSED!")
+print("=" * 60)
+print("\nFAISS is working correctly on your system.")
+print("The issue may be with:")
+print("  - Specific data from the database")
+print("  - Memory/size of actual embeddings")
+print("  - Sentence transformers interaction")

test_queries.py ADDED Viewed

	@@ -0,0 +1,252 @@

+#!/usr/bin/env python3
+"""
+Test script to validate the Congressional Bioguide database and search functionality.
+"""
+import sqlite3
+import json
+from pathlib import Path
+def test_database():
+    """Test database structure and basic queries."""
+    print("Testing Database...")
+    print("=" * 60)
+    if not Path("congress.db").exists():
+        print("❌ Database not found. Run ingest_data.py first.")
+        return False
+    conn = sqlite3.connect("congress.db")
+    cursor = conn.cursor()
+    # Test 1: Count members
+    cursor.execute("SELECT COUNT(*) FROM members")
+    member_count = cursor.fetchone()[0]
+    print(f"✓ Members in database: {member_count}")
+    # Test 2: Count job positions
+    cursor.execute("SELECT COUNT(*) FROM job_positions")
+    job_count = cursor.fetchone()[0]
+    print(f"✓ Job positions recorded: {job_count}")
+    # Test 3: Search by name
+    cursor.execute("""
+        SELECT bio_id, family_name, given_name, birth_date, death_date
+        FROM members
+        WHERE unaccented_family_name = 'Lincoln'
+        ORDER BY birth_date
+    """)
+    lincolns = cursor.fetchall()
+    print(f"\n✓ Found {len(lincolns)} member(s) with family name 'Lincoln':")
+    for bio_id, family, given, birth, death in lincolns:
+        print(f"  - {given} {family} ({bio_id}): {birth} - {death or 'present'}")
+    # Test 4: Party breakdown
+    cursor.execute("""
+        SELECT party, COUNT(DISTINCT bio_id) as count
+        FROM job_positions
+        WHERE party IS NOT NULL
+        GROUP BY party
+        ORDER BY count DESC
+        LIMIT 10
+    """)
+    parties = cursor.fetchall()
+    print(f"\n✓ Top parties by member count:")
+    for party, count in parties:
+        print(f"  - {party}: {count} members")
+    # Test 5: State representation
+    cursor.execute("""
+        SELECT region_code, COUNT(DISTINCT bio_id) as count
+        FROM job_positions
+        WHERE region_code IS NOT NULL AND region_type = 'StateRegion'
+        GROUP BY region_code
+        ORDER BY count DESC
+        LIMIT 10
+    """)
+    states = cursor.fetchall()
+    print(f"\n✓ Top states by member count:")
+    for state, count in states:
+        print(f"  - {state}: {count} members")
+    # Test 6: Relationships
+    cursor.execute("SELECT COUNT(*) FROM relationships")
+    rel_count = cursor.fetchone()[0]
+    print(f"\n✓ Family relationships recorded: {rel_count}")
+    if rel_count > 0:
+        cursor.execute("""
+            SELECT m1.given_name, m1.family_name, r.relationship_type,
+                   m2.given_name, m2.family_name
+            FROM relationships r
+            JOIN members m1 ON r.bio_id = m1.bio_id
+            JOIN members m2 ON r.related_bio_id = m2.bio_id
+            LIMIT 5
+        """)
+        relationships = cursor.fetchall()
+        print("  Sample relationships:")
+        for given1, family1, rel_type, given2, family2 in relationships:
+            print(f"  - {given1} {family1} is {rel_type} of {given2} {family2}")
+    # Test 7: Profile text
+    cursor.execute("""
+        SELECT bio_id, given_name, family_name, LENGTH(profile_text) as text_len
+        FROM members
+        WHERE profile_text IS NOT NULL
+        ORDER BY text_len DESC
+        LIMIT 5
+    """)
+    longest_profiles = cursor.fetchall()
+    print(f"\n✓ Longest biography profiles:")
+    for bio_id, given, family, length in longest_profiles:
+        print(f"  - {given} {family} ({bio_id}): {length} characters")
+    conn.close()
+    return True
+def test_faiss_index():
+    """Test FAISS index."""
+    print("\n\nTesting FAISS Index...")
+    print("=" * 60)
+    if not Path("congress_faiss.index").exists():
+        print("❌ FAISS index not found. Run ingest_data.py first.")
+        return False
+    if not Path("congress_bio_ids.pkl").exists():
+        print("❌ Bio ID mapping not found. Run ingest_data.py first.")
+        return False
+    try:
+        import faiss
+        import pickle
+        from sentence_transformers import SentenceTransformer
+        # Load index
+        index = faiss.read_index("congress_faiss.index")
+        with open("congress_bio_ids.pkl", "rb") as f:
+            bio_ids = pickle.load(f)
+        print(f"✓ FAISS index loaded: {index.ntotal} vectors")
+        print(f"✓ Dimension: {index.d}")
+        # Load model
+        model = SentenceTransformer('all-MiniLM-L6-v2')
+        print("✓ Sentence transformer model loaded")
+        # Test search
+        test_queries = [
+            "lawyers who became judges",
+            "Civil War veterans",
+            "served in the military",
+            "teachers and educators"
+        ]
+        for query in test_queries:
+            print(f"\n✓ Testing query: '{query}'")
+            query_embedding = model.encode([query])[0].reshape(1, -1).astype('float32')
+            faiss.normalize_L2(query_embedding)
+            scores, indices = index.search(query_embedding, 3)
+            # Load database to get names
+            conn = sqlite3.connect("congress.db")
+            cursor = conn.cursor()
+            print("  Top 3 results:")
+            for idx, score in zip(indices[0], scores[0]):
+                if idx < len(bio_ids):
+                    bio_id = bio_ids[idx]
+                    cursor.execute(
+                        "SELECT given_name, family_name FROM members WHERE bio_id = ?",
+                        (bio_id,)
+                    )
+                    result = cursor.fetchone()
+                    if result:
+                        given, family = result
+                        print(f"    - {given} {family} ({bio_id}): score={score:.4f}")
+            conn.close()
+        return True
+    except ImportError as e:
+        print(f"❌ Missing dependency: {e}")
+        print("   Run: pip install -r requirements.txt")
+        return False
+    except Exception as e:
+        print(f"❌ Error testing FAISS: {e}")
+        return False
+def test_sample_profile():
+    """Display a sample profile."""
+    print("\n\nSample Profile...")
+    print("=" * 60)
+    conn = sqlite3.connect("congress.db")
+    conn.row_factory = sqlite3.Row
+    cursor = conn.cursor()
+    # Get a well-known member
+    cursor.execute("""
+        SELECT * FROM members
+        WHERE unaccented_family_name = 'Lincoln' AND unaccented_given_name = 'Abraham'
+        LIMIT 1
+    """)
+    member = cursor.fetchone()
+    if member:
+        bio_id = member['bio_id']
+        print(f"Profile: {member['given_name']} {member['family_name']} ({bio_id})")
+        print(f"Birth: {member['birth_date']}")
+        print(f"Death: {member['death_date']}")
+        print(f"\nBiography excerpt:")
+        profile_text = member['profile_text'] or ""
+        print(f"  {profile_text[:300]}...")
+        # Get positions
+        cursor.execute("""
+            SELECT job_name, party, congress_number, region_code, start_date, end_date
+            FROM job_positions
+            WHERE bio_id = ?
+            ORDER BY start_date
+        """, (bio_id,))
+        positions = cursor.fetchall()
+        if positions:
+            print(f"\nPositions held ({len(positions)}):")
+            for pos in positions:
+                print(f"  - {pos['job_name']} ({pos['party']}), {pos['region_code']}")
+                print(f"    Congress {pos['congress_number']}: {pos['start_date']} - {pos['end_date']}")
+    conn.close()
+def main():
+    """Run all tests."""
+    print("Congressional Bioguide Database Test Suite")
+    print("=" * 60)
+    print()
+    db_ok = test_database()
+    faiss_ok = test_faiss_index()
+    if db_ok:
+        test_sample_profile()
+    print("\n" + "=" * 60)
+    if db_ok and faiss_ok:
+        print("✓ All tests passed!")
+        print("\nThe system is ready to use. Start the MCP server with:")
+        print("  python3 server.py")
+    else:
+        print("❌ Some tests failed. Please check the errors above.")
+        if not db_ok:
+            print("  Run: python3 ingest_data.py")
+if __name__ == "__main__":
+    main()

test_sentence_transformers.py ADDED Viewed

	@@ -0,0 +1,118 @@

+#!/usr/bin/env python3
+"""
+Test sentence-transformers to isolate the segfault.
+"""
+import sys
+import os
+print("=" * 60)
+print("SENTENCE TRANSFORMERS TEST")
+print("=" * 60)
+print(f"Python version: {sys.version}")
+print()
+# Test 1: Import sentence_transformers
+print("Test 1: Import sentence_transformers...")
+try:
+    from sentence_transformers import SentenceTransformer
+    print(f"  ✓ sentence_transformers imported")
+except Exception as e:
+    print(f"  ❌ Failed: {e}")
+    sys.exit(1)
+# Test 2: Load model
+print("\nTest 2: Load model (this downloads ~90MB on first run)...")
+try:
+    os.environ['TOKENIZERS_PARALLELISM'] = 'false'
+    model = SentenceTransformer('all-MiniLM-L6-v2')
+    print(f"  ✓ Model loaded")
+except Exception as e:
+    print(f"  ❌ Failed: {e}")
+    import traceback
+    traceback.print_exc()
+    sys.exit(1)
+# Test 3: Encode simple text
+print("\nTest 3: Encode simple text...")
+try:
+    text = "This is a test sentence."
+    embedding = model.encode([text])
+    print(f"  ✓ Encoded text, embedding shape: {embedding.shape}")
+except Exception as e:
+    print(f"  ❌ Failed: {e}")
+    import traceback
+    traceback.print_exc()
+    sys.exit(1)
+# Test 4: Encode batch
+print("\nTest 4: Encode batch of texts...")
+try:
+    texts = ["First sentence", "Second sentence", "Third sentence"]
+    embeddings = model.encode(texts, show_progress_bar=False)
+    print(f"  ✓ Encoded {len(texts)} texts, shape: {embeddings.shape}")
+except Exception as e:
+    print(f"  ❌ Failed: {e}")
+    import traceback
+    traceback.print_exc()
+    sys.exit(1)
+# Test 5: Encode with explicit parameters
+print("\nTest 5: Encode with explicit parameters (like in our script)...")
+try:
+    embeddings = model.encode(
+        texts,
+        show_progress_bar=False,
+        convert_to_numpy=True,
+        normalize_embeddings=False,
+        device='cpu'
+    )
+    print(f"  ✓ Encoded with explicit params, shape: {embeddings.shape}")
+except Exception as e:
+    print(f"  ❌ Failed: {e}")
+    import traceback
+    traceback.print_exc()
+    sys.exit(1)
+# Test 6: Encode larger batch
+print("\nTest 6: Encode larger batch (100 texts)...")
+try:
+    large_texts = [f"This is test sentence number {i}" for i in range(100)]
+    embeddings = model.encode(
+        large_texts,
+        show_progress_bar=False,
+        convert_to_numpy=True,
+        normalize_embeddings=False,
+        device='cpu'
+    )
+    print(f"  ✓ Encoded {len(large_texts)} texts, shape: {embeddings.shape}")
+except Exception as e:
+    print(f"  ❌ Failed: {e}")
+    import traceback
+    traceback.print_exc()
+    sys.exit(1)
+# Test 7: Test with actual biography-like text
+print("\nTest 7: Encode biography-like text...")
+try:
+    bio = """A Representative from Illinois and 16th President of the United States;
+    born in Hardin County, Ky., February 12, 1809; moved with his parents to a tract
+    on Little Pigeon Creek, Ind., in 1816; attended a log-cabin school at short intervals
+    and was self-instructed in elementary branches."""
+    embedding = model.encode([bio], show_progress_bar=False, device='cpu')
+    print(f"  ✓ Encoded biography, shape: {embedding.shape}")
+except Exception as e:
+    print(f"  ❌ Failed: {e}")
+    import traceback
+    traceback.print_exc()
+    sys.exit(1)
+print("\n" + "=" * 60)
+print("✅ ALL TESTS PASSED!")
+print("=" * 60)
+print("\nSentence transformers is working correctly.")
+print("The issue may be with the combination of:")
+print("  - Very large batch processing")
+print("  - Integration with FAISS normalize")
+print("  - Memory management with 13k+ texts")