Spaces:

stefanjwojcik
/

BioGuideMCP

Running

App Files Files Community

BioGuideMCP / README.md

stefanjwojcik

Add setup script and comprehensive tests for Congressional Bioguide MCP Server

15de73a about 1 month ago

preview code

raw

history blame contribute delete

10.2 kB

	---
	title: BioGuideMCP
	emoji: 👁
	colorFrom: purple
	colorTo: yellow
	sdk: gradio
	sdk_version: 5.49.1
	app_file: gradio_app.py
	pinned: false
	license: mit
	short_description: 'An MCP allowing users to analyze congressional biographies. '
	---

	Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

	# Congressional Bioguide MCP Server

	A Model Context Protocol (MCP) server that provides access to Congressional member profiles with both structured SQL queries and semantic search capabilities.

	## Deployment Options

	### 1. Gradio MCP (Hugging Face Spaces)

	Run this MCP as a Gradio app with web interface + MCP server:

	```bash
	python gradio_app.py
	```

	This will launch a web interface at `http://localhost:7860` with 9 tools exposed as both a web UI and MCP tools.

	Deploy to Hugging Face Spaces:
	1. Create a new Space on Hugging Face
	2. Set SDK to `gradio` (version 5.49.1+)
	3. Upload all files including `gradio_app.py`, `congress.db`, `congress_faiss.index`, and `congress_bio_ids.pkl`
	4. The app will automatically launch with `mcp_server=True`

	### 2. Traditional MCP Server

	Use the original MCP server for integration with Claude Desktop or other MCP clients:

	```bash
	python server.py
	```

	Test the server backend with `npx @modelcontextprotocol/inspector python server.py` or integrate it into your Claude setup.

	## Features

	### Gradio MCP Tools (9 Tools)

	The Gradio app (`gradio_app.py`) exposes these 9 MCP tools:

	1. search_by_name - Search members by name (first/last name)
	2. search_by_party - Find by political party affiliation
	3. search_by_state - Search by state/region representation
	4. semantic_search_biography - AI-powered natural language search of biographies
	5. get_member_profile - Get complete profile by Bioguide ID
	6. count_members_by_party - Count members grouped by party
	7. count_members_by_state - Count members grouped by state
	8. execute_sql_query - Execute custom SQL queries (read-only)
	9. get_database_schema - View database structure

	### Traditional MCP Server Tools (14 Tools)

	The traditional server (`server.py`) provides all tools:

	Search Tools (return concise results by default):
	1. search_by_name - Search members by name (returns: name, dates, party, congress)
	2. search_by_party - Find by political party affiliation
	3. search_by_state - Search by state/region representation
	4. search_by_congress - Get all members from specific Congress
	5. search_by_date_range - Find members who served during specific dates
	6. semantic_search_biography - Natural language AI search of biographies
	7. search_biography_regex - Regex pattern search (keywords, phrases)
	8. search_by_relationship - Find members with family relationships

	Aggregation & Analysis Tools (efficient for large datasets):
	9. count_members - Count members by party, state, position, congress, or year
	10. temporal_analysis - Analyze trends over time (party shifts, demographics, etc.)
	11. count_by_biography_content - Count members mentioning specific keywords (e.g., "Harvard", "lawyer")

	Profile & Query Tools:
	12. get_member_profile - Get complete profile by Bioguide ID
	13. execute_sql_query - Execute custom SQL queries (read-only)
	14. get_database_schema - View database structure

	### Database Schema

	- members - Core biographical data (13,047+ profiles)
	- job_positions - Congressional positions and affiliations
	- images - Profile images
	- relationships - Family relationships between members
	- creative_works - Publications by members
	- assets - Additional media assets

	## Requirements

	- Python 3.10+ including Python 3.14
	- ✅ Python 3.14 is now supported! (with single-threaded mode for FAISS)

	## Setup

	### Quick Start

	```bash
	./setup.sh
	```

	This automated script will:
	1. Create a Python virtual environment
	2. Install all dependencies
	3. Ingest all Congressional profiles into SQLite
	4. Build the FAISS semantic search index

	### Manual Setup

	If you prefer manual setup:

	#### 1. Install Dependencies

	```bash
	python3 -m venv venv
	source venv/bin/activate # On Windows: venv\Scripts\activate
	pip install -r requirements.txt
	```

	#### 2. Ingest Data

	Run the ingestion script to create the SQLite database and FAISS index:

	```bash
	python3 ingest_data.py
	```

	This will:
	- Create `congress.db` SQLite database (13,047+ members)
	- Build `congress_faiss.index` for semantic search
	- Generate `congress_bio_ids.pkl` for ID mapping

	Expected output:
	```
	Starting Congressional Bioguide ingestion...
	============================================================
	✓ Database schema created
	Ingesting 13047 profiles...
	Processed 1000/13047 profiles...
	...
	✓ Ingested 13047 profiles into database
	Building FAISS index for semantic search...
	Encoding 13047 biographies...
	Encoded 3200/13047 biographies...
	...
	✓ FAISS index created with 13047 vectors
	Index dimension: 384
	============================================================
	✓ Ingestion complete!
	```

	Note: Ingestion takes approximately 5-10 minutes depending on your system.

	#### 3. Test the System (Optional)

	```bash
	python3 test_queries.py
	```

	#### 4. Run the Server

	```bash
	python3 server.py
	```

	## Usage Examples

	### Name Search
	```json
	{
	"name": "search_by_name",
	"arguments": {
	"family_name": "Lincoln"
	}
	}
	```

	### Party Search
	```json
	{
	"name": "search_by_party",
	"arguments": {
	"party": "Republican",
	"congress_number": 117
	}
	}
	```

	### State Search
	```json
	{
	"name": "search_by_state",
	"arguments": {
	"state_code": "CA",
	"congress_number": 117
	}
	}
	```

	### Semantic Search
	```json
	{
	"name": "semantic_search_biography",
	"arguments": {
	"query": "Civil War veterans who became lawyers",
	"top_k": 5
	}
	}
	```

	### Regex Search - Find Keywords
	```json
	{
	"name": "search_biography_regex",
	"arguments": {
	"pattern": "Harvard",
	"limit": 5
	}
	}
	```

	### Regex Search - Filter by Party
	```json
	{
	"name": "search_biography_regex",
	"arguments": {
	"pattern": "lawyer",
	"filter_party": "Republican",
	"limit": 10
	}
	}
	```

	### Regex Search - Filter by State and Congress
	```json
	{
	"name": "search_biography_regex",
	"arguments": {
	"pattern": "served.*Confederate Army",
	"filter_state": "VA",
	"limit": 5
	}
	}
	```

	Note: Regex search returns concise results (name, dates, party, state) by default. Set `return_full_profile: true` to get biography text.

	### Count Members by Party
	```json
	{
	"name": "count_members",
	"arguments": {
	"group_by": "party"
	}
	}
	```

	### Count Republicans by State in 117th Congress
	```json
	{
	"name": "count_members",
	"arguments": {
	"group_by": "state",
	"filter_party": "Republican",
	"filter_congress": 117
	}
	}
	```

	### Temporal Analysis - Party Changes Over Time
	```json
	{
	"name": "temporal_analysis",
	"arguments": {
	"analysis_type": "party_over_time",
	"time_unit": "congress",
	"start_date": "1900-01-01",
	"end_date": "2000-12-31"
	}
	}
	```

	### Demographics Analysis - Average Age by Congress
	```json
	{
	"name": "temporal_analysis",
	"arguments": {
	"analysis_type": "demographics",
	"time_unit": "congress"
	}
	}
	```

	### Count Members Who Attended Harvard
	```json
	{
	"name": "count_by_biography_content",
	"arguments": {
	"keywords": ["Harvard"]
	}
	}
	```

	### Count Lawyers by Party
	```json
	{
	"name": "count_by_biography_content",
	"arguments": {
	"keywords": ["lawyer", "attorney"],
	"breakdown_by": "party"
	}
	}
	```

	### Count Members Who Were Both Lawyers AND Veterans
	```json
	{
	"name": "count_by_biography_content",
	"arguments": {
	"keywords": ["lawyer", "military", "army"],
	"match_all": false,
	"breakdown_by": "state"
	}
	}
	```

	### SQL Query - Find Longest Serving Members
	```json
	{
	"name": "execute_sql_query",
	"arguments": {
	"query": "SELECT family_name, given_name, COUNT(DISTINCT congress_number) as congresses FROM members m JOIN job_positions j ON m.bio_id = j.bio_id GROUP BY m.bio_id HAVING congresses > 5 ORDER BY congresses DESC LIMIT 10"
	}
	}
	```

	### Get Full Member Profile
	```json
	{
	"name": "get_member_profile",
	"arguments": {
	"bio_id": "L000313"
	}
	}
	```

	### Search by Congress Number
	```json
	{
	"name": "search_by_congress",
	"arguments": {
	"congress_number": 117,
	"chamber": "Senator"
	}
	}
	```

	### Search by Date Range
	```json
	{
	"name": "search_by_date_range",
	"arguments": {
	"start_date": "1861-03-04",
	"end_date": "1865-03-04"
	}
	}
	```

	### Find Family Relationships
	```json
	{
	"name": "search_by_relationship",
	"arguments": {
	"relationship_type": "father"
	}
	}
	```

	### Complex SQL - Party Transitions
	```json
	{
	"name": "execute_sql_query",
	"arguments": {
	"query": "SELECT m.bio_id, m.family_name, m.given_name, GROUP_CONCAT(DISTINCT j.party) as parties FROM members m JOIN job_positions j ON m.bio_id = j.bio_id WHERE j.party IS NOT NULL GROUP BY m.bio_id HAVING COUNT(DISTINCT j.party) > 1 LIMIT 20"
	}
	}
	```

	## Data Source

	Data comes from the US Congressional Bioguide, containing biographical information for all members of Congress throughout history.

	## Technical Details

	- Database: SQLite for structured queries
	- Semantic Search: FAISS with sentence-transformers (all-MiniLM-L6-v2)
	- Embedding Dimension: 384
	- Index Type: Flat IP (Inner Product) with L2 normalization for cosine similarity

	## MCP Configuration

	Add to your MCP settings file (usually `~/.config/claude/claude_desktop_config.json` on macOS/Linux or `%APPDATA%\Claude\claude_desktop_config.json` on Windows):

	```json
	{
	"mcpServers": {
	"congressional-bioguide": {
	"command": "/Users/electron/workspace/Nanocentury AI/NIO/BioGuideMCP/venv/bin/python",
	"args": [
	"/Users/electron/workspace/Nanocentury AI/NIO/BioGuideMCP/server.py"
	],
	"cwd": "/Users/electron/workspace/Nanocentury AI/NIO/BioGuideMCP"
	}
	}
	}
	```

	Note: This uses the virtual environment's Python which has all the required dependencies installed.

	## License

	Data is public domain from the US Congressional Bioguide.