BioGuideMCP / README.md
stefanjwojcik's picture
Add setup script and comprehensive tests for Congressional Bioguide MCP Server
15de73a
---
title: BioGuideMCP
emoji: πŸ‘
colorFrom: purple
colorTo: yellow
sdk: gradio
sdk_version: 5.49.1
app_file: gradio_app.py
pinned: false
license: mit
short_description: 'An MCP allowing users to analyze congressional biographies. '
---
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
# Congressional Bioguide MCP Server
A Model Context Protocol (MCP) server that provides access to Congressional member profiles with both structured SQL queries and semantic search capabilities.
## Deployment Options
### 1. Gradio MCP (Hugging Face Spaces)
Run this MCP as a Gradio app with web interface + MCP server:
```bash
python gradio_app.py
```
This will launch a web interface at `http://localhost:7860` with 9 tools exposed as both a web UI and MCP tools.
**Deploy to Hugging Face Spaces:**
1. Create a new Space on Hugging Face
2. Set SDK to `gradio` (version 5.49.1+)
3. Upload all files including `gradio_app.py`, `congress.db`, `congress_faiss.index`, and `congress_bio_ids.pkl`
4. The app will automatically launch with `mcp_server=True`
### 2. Traditional MCP Server
Use the original MCP server for integration with Claude Desktop or other MCP clients:
```bash
python server.py
```
Test the server backend with `npx @modelcontextprotocol/inspector python server.py` or integrate it into your Claude setup.
## Features
### Gradio MCP Tools (9 Tools)
The Gradio app (`gradio_app.py`) exposes these 9 MCP tools:
1. **search_by_name** - Search members by name (first/last name)
2. **search_by_party** - Find by political party affiliation
3. **search_by_state** - Search by state/region representation
4. **semantic_search_biography** - AI-powered natural language search of biographies
5. **get_member_profile** - Get complete profile by Bioguide ID
6. **count_members_by_party** - Count members grouped by party
7. **count_members_by_state** - Count members grouped by state
8. **execute_sql_query** - Execute custom SQL queries (read-only)
9. **get_database_schema** - View database structure
### Traditional MCP Server Tools (14 Tools)
The traditional server (`server.py`) provides all tools:
**Search Tools** (return concise results by default):
1. **search_by_name** - Search members by name (returns: name, dates, party, congress)
2. **search_by_party** - Find by political party affiliation
3. **search_by_state** - Search by state/region representation
4. **search_by_congress** - Get all members from specific Congress
5. **search_by_date_range** - Find members who served during specific dates
6. **semantic_search_biography** - Natural language AI search of biographies
7. **search_biography_regex** - Regex pattern search (keywords, phrases)
8. **search_by_relationship** - Find members with family relationships
**Aggregation & Analysis Tools** (efficient for large datasets):
9. **count_members** - Count members by party, state, position, congress, or year
10. **temporal_analysis** - Analyze trends over time (party shifts, demographics, etc.)
11. **count_by_biography_content** - Count members mentioning specific keywords (e.g., "Harvard", "lawyer")
**Profile & Query Tools**:
12. **get_member_profile** - Get complete profile by Bioguide ID
13. **execute_sql_query** - Execute custom SQL queries (read-only)
14. **get_database_schema** - View database structure
### Database Schema
- **members** - Core biographical data (13,047+ profiles)
- **job_positions** - Congressional positions and affiliations
- **images** - Profile images
- **relationships** - Family relationships between members
- **creative_works** - Publications by members
- **assets** - Additional media assets
## Requirements
- **Python 3.10+** including Python 3.14
- βœ… **Python 3.14 is now supported!** (with single-threaded mode for FAISS)
## Setup
### Quick Start
```bash
./setup.sh
```
This automated script will:
1. Create a Python virtual environment
2. Install all dependencies
3. Ingest all Congressional profiles into SQLite
4. Build the FAISS semantic search index
### Manual Setup
If you prefer manual setup:
#### 1. Install Dependencies
```bash
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
```
#### 2. Ingest Data
Run the ingestion script to create the SQLite database and FAISS index:
```bash
python3 ingest_data.py
```
This will:
- Create `congress.db` SQLite database (13,047+ members)
- Build `congress_faiss.index` for semantic search
- Generate `congress_bio_ids.pkl` for ID mapping
Expected output:
```
Starting Congressional Bioguide ingestion...
============================================================
βœ“ Database schema created
Ingesting 13047 profiles...
Processed 1000/13047 profiles...
...
βœ“ Ingested 13047 profiles into database
Building FAISS index for semantic search...
Encoding 13047 biographies...
Encoded 3200/13047 biographies...
...
βœ“ FAISS index created with 13047 vectors
Index dimension: 384
============================================================
βœ“ Ingestion complete!
```
**Note**: Ingestion takes approximately 5-10 minutes depending on your system.
#### 3. Test the System (Optional)
```bash
python3 test_queries.py
```
#### 4. Run the Server
```bash
python3 server.py
```
## Usage Examples
### Name Search
```json
{
"name": "search_by_name",
"arguments": {
"family_name": "Lincoln"
}
}
```
### Party Search
```json
{
"name": "search_by_party",
"arguments": {
"party": "Republican",
"congress_number": 117
}
}
```
### State Search
```json
{
"name": "search_by_state",
"arguments": {
"state_code": "CA",
"congress_number": 117
}
}
```
### Semantic Search
```json
{
"name": "semantic_search_biography",
"arguments": {
"query": "Civil War veterans who became lawyers",
"top_k": 5
}
}
```
### Regex Search - Find Keywords
```json
{
"name": "search_biography_regex",
"arguments": {
"pattern": "Harvard",
"limit": 5
}
}
```
### Regex Search - Filter by Party
```json
{
"name": "search_biography_regex",
"arguments": {
"pattern": "lawyer",
"filter_party": "Republican",
"limit": 10
}
}
```
### Regex Search - Filter by State and Congress
```json
{
"name": "search_biography_regex",
"arguments": {
"pattern": "served.*Confederate Army",
"filter_state": "VA",
"limit": 5
}
}
```
**Note**: Regex search returns concise results (name, dates, party, state) by default. Set `return_full_profile: true` to get biography text.
### Count Members by Party
```json
{
"name": "count_members",
"arguments": {
"group_by": "party"
}
}
```
### Count Republicans by State in 117th Congress
```json
{
"name": "count_members",
"arguments": {
"group_by": "state",
"filter_party": "Republican",
"filter_congress": 117
}
}
```
### Temporal Analysis - Party Changes Over Time
```json
{
"name": "temporal_analysis",
"arguments": {
"analysis_type": "party_over_time",
"time_unit": "congress",
"start_date": "1900-01-01",
"end_date": "2000-12-31"
}
}
```
### Demographics Analysis - Average Age by Congress
```json
{
"name": "temporal_analysis",
"arguments": {
"analysis_type": "demographics",
"time_unit": "congress"
}
}
```
### Count Members Who Attended Harvard
```json
{
"name": "count_by_biography_content",
"arguments": {
"keywords": ["Harvard"]
}
}
```
### Count Lawyers by Party
```json
{
"name": "count_by_biography_content",
"arguments": {
"keywords": ["lawyer", "attorney"],
"breakdown_by": "party"
}
}
```
### Count Members Who Were Both Lawyers AND Veterans
```json
{
"name": "count_by_biography_content",
"arguments": {
"keywords": ["lawyer", "military", "army"],
"match_all": false,
"breakdown_by": "state"
}
}
```
### SQL Query - Find Longest Serving Members
```json
{
"name": "execute_sql_query",
"arguments": {
"query": "SELECT family_name, given_name, COUNT(DISTINCT congress_number) as congresses FROM members m JOIN job_positions j ON m.bio_id = j.bio_id GROUP BY m.bio_id HAVING congresses > 5 ORDER BY congresses DESC LIMIT 10"
}
}
```
### Get Full Member Profile
```json
{
"name": "get_member_profile",
"arguments": {
"bio_id": "L000313"
}
}
```
### Search by Congress Number
```json
{
"name": "search_by_congress",
"arguments": {
"congress_number": 117,
"chamber": "Senator"
}
}
```
### Search by Date Range
```json
{
"name": "search_by_date_range",
"arguments": {
"start_date": "1861-03-04",
"end_date": "1865-03-04"
}
}
```
### Find Family Relationships
```json
{
"name": "search_by_relationship",
"arguments": {
"relationship_type": "father"
}
}
```
### Complex SQL - Party Transitions
```json
{
"name": "execute_sql_query",
"arguments": {
"query": "SELECT m.bio_id, m.family_name, m.given_name, GROUP_CONCAT(DISTINCT j.party) as parties FROM members m JOIN job_positions j ON m.bio_id = j.bio_id WHERE j.party IS NOT NULL GROUP BY m.bio_id HAVING COUNT(DISTINCT j.party) > 1 LIMIT 20"
}
}
```
## Data Source
Data comes from the US Congressional Bioguide, containing biographical information for all members of Congress throughout history.
## Technical Details
- **Database**: SQLite for structured queries
- **Semantic Search**: FAISS with sentence-transformers (all-MiniLM-L6-v2)
- **Embedding Dimension**: 384
- **Index Type**: Flat IP (Inner Product) with L2 normalization for cosine similarity
## MCP Configuration
Add to your MCP settings file (usually `~/.config/claude/claude_desktop_config.json` on macOS/Linux or `%APPDATA%\Claude\claude_desktop_config.json` on Windows):
```json
{
"mcpServers": {
"congressional-bioguide": {
"command": "/Users/electron/workspace/Nanocentury AI/NIO/BioGuideMCP/venv/bin/python",
"args": [
"/Users/electron/workspace/Nanocentury AI/NIO/BioGuideMCP/server.py"
],
"cwd": "/Users/electron/workspace/Nanocentury AI/NIO/BioGuideMCP"
}
}
}
```
**Note**: This uses the virtual environment's Python which has all the required dependencies installed.
## License
Data is public domain from the US Congressional Bioguide.