BioGuideMCP / README.md
stefanjwojcik's picture
Add setup script and comprehensive tests for Congressional Bioguide MCP Server
15de73a

A newer version of the Gradio SDK is available: 6.1.0

Upgrade
metadata
title: BioGuideMCP
emoji: πŸ‘
colorFrom: purple
colorTo: yellow
sdk: gradio
sdk_version: 5.49.1
app_file: gradio_app.py
pinned: false
license: mit
short_description: 'An MCP allowing users to analyze congressional biographies. '

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

Congressional Bioguide MCP Server

A Model Context Protocol (MCP) server that provides access to Congressional member profiles with both structured SQL queries and semantic search capabilities.

Deployment Options

1. Gradio MCP (Hugging Face Spaces)

Run this MCP as a Gradio app with web interface + MCP server:

python gradio_app.py

This will launch a web interface at http://localhost:7860 with 9 tools exposed as both a web UI and MCP tools.

Deploy to Hugging Face Spaces:

  1. Create a new Space on Hugging Face
  2. Set SDK to gradio (version 5.49.1+)
  3. Upload all files including gradio_app.py, congress.db, congress_faiss.index, and congress_bio_ids.pkl
  4. The app will automatically launch with mcp_server=True

2. Traditional MCP Server

Use the original MCP server for integration with Claude Desktop or other MCP clients:

python server.py

Test the server backend with npx @modelcontextprotocol/inspector python server.py or integrate it into your Claude setup.

Features

Gradio MCP Tools (9 Tools)

The Gradio app (gradio_app.py) exposes these 9 MCP tools:

  1. search_by_name - Search members by name (first/last name)
  2. search_by_party - Find by political party affiliation
  3. search_by_state - Search by state/region representation
  4. semantic_search_biography - AI-powered natural language search of biographies
  5. get_member_profile - Get complete profile by Bioguide ID
  6. count_members_by_party - Count members grouped by party
  7. count_members_by_state - Count members grouped by state
  8. execute_sql_query - Execute custom SQL queries (read-only)
  9. get_database_schema - View database structure

Traditional MCP Server Tools (14 Tools)

The traditional server (server.py) provides all tools:

Search Tools (return concise results by default):

  1. search_by_name - Search members by name (returns: name, dates, party, congress)
  2. search_by_party - Find by political party affiliation
  3. search_by_state - Search by state/region representation
  4. search_by_congress - Get all members from specific Congress
  5. search_by_date_range - Find members who served during specific dates
  6. semantic_search_biography - Natural language AI search of biographies
  7. search_biography_regex - Regex pattern search (keywords, phrases)
  8. search_by_relationship - Find members with family relationships

Aggregation & Analysis Tools (efficient for large datasets): 9. count_members - Count members by party, state, position, congress, or year 10. temporal_analysis - Analyze trends over time (party shifts, demographics, etc.) 11. count_by_biography_content - Count members mentioning specific keywords (e.g., "Harvard", "lawyer")

Profile & Query Tools: 12. get_member_profile - Get complete profile by Bioguide ID 13. execute_sql_query - Execute custom SQL queries (read-only) 14. get_database_schema - View database structure

Database Schema

  • members - Core biographical data (13,047+ profiles)
  • job_positions - Congressional positions and affiliations
  • images - Profile images
  • relationships - Family relationships between members
  • creative_works - Publications by members
  • assets - Additional media assets

Requirements

  • Python 3.10+ including Python 3.14
  • βœ… Python 3.14 is now supported! (with single-threaded mode for FAISS)

Setup

Quick Start

./setup.sh

This automated script will:

  1. Create a Python virtual environment
  2. Install all dependencies
  3. Ingest all Congressional profiles into SQLite
  4. Build the FAISS semantic search index

Manual Setup

If you prefer manual setup:

1. Install Dependencies

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

2. Ingest Data

Run the ingestion script to create the SQLite database and FAISS index:

python3 ingest_data.py

This will:

  • Create congress.db SQLite database (13,047+ members)
  • Build congress_faiss.index for semantic search
  • Generate congress_bio_ids.pkl for ID mapping

Expected output: ``` Starting Congressional Bioguide ingestion...

βœ“ Database schema created Ingesting 13047 profiles... Processed 1000/13047 profiles... ... βœ“ Ingested 13047 profiles into database Building FAISS index for semantic search... Encoding 13047 biographies... Encoded 3200/13047 biographies... ... βœ“ FAISS index created with 13047 vectors Index dimension: 384

βœ“ Ingestion complete!


**Note**: Ingestion takes approximately 5-10 minutes depending on your system.

#### 3. Test the System (Optional)

```bash
python3 test_queries.py

4. Run the Server

python3 server.py

Usage Examples

Name Search

{
  "name": "search_by_name",
  "arguments": {
    "family_name": "Lincoln"
  }
}

Party Search

{
  "name": "search_by_party",
  "arguments": {
    "party": "Republican",
    "congress_number": 117
  }
}

State Search

{
  "name": "search_by_state",
  "arguments": {
    "state_code": "CA",
    "congress_number": 117
  }
}

Semantic Search

{
  "name": "semantic_search_biography",
  "arguments": {
    "query": "Civil War veterans who became lawyers",
    "top_k": 5
  }
}

Regex Search - Find Keywords

{
  "name": "search_biography_regex",
  "arguments": {
    "pattern": "Harvard",
    "limit": 5
  }
}

Regex Search - Filter by Party

{
  "name": "search_biography_regex",
  "arguments": {
    "pattern": "lawyer",
    "filter_party": "Republican",
    "limit": 10
  }
}

Regex Search - Filter by State and Congress

{
  "name": "search_biography_regex",
  "arguments": {
    "pattern": "served.*Confederate Army",
    "filter_state": "VA",
    "limit": 5
  }
}

Note: Regex search returns concise results (name, dates, party, state) by default. Set return_full_profile: true to get biography text.

Count Members by Party

{
  "name": "count_members",
  "arguments": {
    "group_by": "party"
  }
}

Count Republicans by State in 117th Congress

{
  "name": "count_members",
  "arguments": {
    "group_by": "state",
    "filter_party": "Republican",
    "filter_congress": 117
  }
}

Temporal Analysis - Party Changes Over Time

{
  "name": "temporal_analysis",
  "arguments": {
    "analysis_type": "party_over_time",
    "time_unit": "congress",
    "start_date": "1900-01-01",
    "end_date": "2000-12-31"
  }
}

Demographics Analysis - Average Age by Congress

{
  "name": "temporal_analysis",
  "arguments": {
    "analysis_type": "demographics",
    "time_unit": "congress"
  }
}

Count Members Who Attended Harvard

{
  "name": "count_by_biography_content",
  "arguments": {
    "keywords": ["Harvard"]
  }
}

Count Lawyers by Party

{
  "name": "count_by_biography_content",
  "arguments": {
    "keywords": ["lawyer", "attorney"],
    "breakdown_by": "party"
  }
}

Count Members Who Were Both Lawyers AND Veterans

{
  "name": "count_by_biography_content",
  "arguments": {
    "keywords": ["lawyer", "military", "army"],
    "match_all": false,
    "breakdown_by": "state"
  }
}

SQL Query - Find Longest Serving Members

{
  "name": "execute_sql_query",
  "arguments": {
    "query": "SELECT family_name, given_name, COUNT(DISTINCT congress_number) as congresses FROM members m JOIN job_positions j ON m.bio_id = j.bio_id GROUP BY m.bio_id HAVING congresses > 5 ORDER BY congresses DESC LIMIT 10"
  }
}

Get Full Member Profile

{
  "name": "get_member_profile",
  "arguments": {
    "bio_id": "L000313"
  }
}

Search by Congress Number

{
  "name": "search_by_congress",
  "arguments": {
    "congress_number": 117,
    "chamber": "Senator"
  }
}

Search by Date Range

{
  "name": "search_by_date_range",
  "arguments": {
    "start_date": "1861-03-04",
    "end_date": "1865-03-04"
  }
}

Find Family Relationships

{
  "name": "search_by_relationship",
  "arguments": {
    "relationship_type": "father"
  }
}

Complex SQL - Party Transitions

{
  "name": "execute_sql_query",
  "arguments": {
    "query": "SELECT m.bio_id, m.family_name, m.given_name, GROUP_CONCAT(DISTINCT j.party) as parties FROM members m JOIN job_positions j ON m.bio_id = j.bio_id WHERE j.party IS NOT NULL GROUP BY m.bio_id HAVING COUNT(DISTINCT j.party) > 1 LIMIT 20"
  }
}

Data Source

Data comes from the US Congressional Bioguide, containing biographical information for all members of Congress throughout history.

Technical Details

  • Database: SQLite for structured queries
  • Semantic Search: FAISS with sentence-transformers (all-MiniLM-L6-v2)
  • Embedding Dimension: 384
  • Index Type: Flat IP (Inner Product) with L2 normalization for cosine similarity

MCP Configuration

Add to your MCP settings file (usually ~/.config/claude/claude_desktop_config.json on macOS/Linux or %APPDATA%\Claude\claude_desktop_config.json on Windows):

{
  "mcpServers": {
    "congressional-bioguide": {
      "command": "/Users/electron/workspace/Nanocentury AI/NIO/BioGuideMCP/venv/bin/python",
      "args": [
        "/Users/electron/workspace/Nanocentury AI/NIO/BioGuideMCP/server.py"
      ],
      "cwd": "/Users/electron/workspace/Nanocentury AI/NIO/BioGuideMCP"
    }
  }
}

Note: This uses the virtual environment's Python which has all the required dependencies installed.

License

Data is public domain from the US Congressional Bioguide.