CMMC Expert 72B
Notice: These models are provided for proof-of-concept and testing purposes only. Production-grade models are not publicly shared. For inquiries regarding production models or commercial licensing, please contact the maintainer: Nathan Maine.
A locally-hosted, fine-tuned language model specialized in CMMC 2.0, NIST 800-171, NIST 800-53, HIPAA, DFARS, and cybersecurity compliance frameworks.
This is the 72B variant — the most capable model in the suite, designed for complex multi-framework analysis and comprehensive compliance reasoning. Part of a four-model suite (7B, 14B, 32B, 72B) sharing the same compliance knowledge base.
Quick Start (Ollama)
# Download and run
ollama pull Nathan-Maine/cmmc-expert-72b
# Ask a compliance question
ollama run cmmc-expert-72b "What access controls are required for CMMC Level 2?"
# Or use the OpenAI-compatible API
curl http://localhost:11434/api/generate -d '{
"model": "cmmc-expert-72b",
"prompt": "What are the key differences between CMMC Level 1 and Level 2?",
"stream": false
}'
Model Details
| Property | Value |
|---|---|
| Base Model | Qwen2.5-72B-Instruct |
| Parameters | 72.7 billion |
| Fine-Tuning Method | QLoRA (4-bit base, LoRA rank 8, alpha 16) |
| Quantization | q4_k_m (GGUF) |
| File Size | 44.2 GB |
| Context Length | 32,768 tokens |
| Training Hardware | NVIDIA A100 80GB SXM (RunPod) |
| Training Time | ~16.9 hours |
| Training Framework | Unsloth + HuggingFace TRL + PEFT |
Security Domain Coverage
Models are fine-tuned for complete security domain coverage, including vulnerability analysis, incident response scenarios, and access control failure modes required for professional SSP and POA&M generation. Behavioral guardrails and policy enforcement are handled at the governed-llm-gateway layer.
Base model migration to Meta Llama 3.1/3.3 (US-origin, open weights) is in progress.
Compliance Framework Coverage
Trained across eight overlapping frameworks to support cross-framework mapping:
| Framework | Coverage |
|---|---|
| CMMC 2.0 (32 CFR Part 170) | All three levels — 17 L1 practices, 110 L2, 134 L3, assessment methodology |
| NIST SP 800-171 Rev. 2 | 110 security requirements across 14 families |
| NIST SP 800-172 | Enhanced security requirements for critical CUI programs |
| NIST SP 800-53 Rev. 5 | Full catalog of 1,189 controls across 20 families |
| NIST SP 800-37 | Risk Management Framework (RMF) steps and authorization |
| NIST CSF | Identify, Protect, Detect, Respond, Recover functions |
| HIPAA Security Rule | Administrative, physical, and technical safeguards |
| DFARS Clauses | 252.204-7012, 7019, 7020, 7021 — contract-level compliance |
Training Data
13,434 training + 3,472 validation examples (~3.3M tokens) assembled from 5 curated sources:
| Source | Examples | Share |
|---|---|---|
| NIST Cybersecurity (filtered from 424K) | 6,372 | 47.4% |
| CMMC Full | 4,787 | 35.6% |
| CMMC Balanced | 994 | 7.4% |
| HIPAA Compliance | 961 | 7.2% |
| CMMC Core | 320 | 2.4% |
Data processing pipeline:
- Format conversion — Raw text to chat-style instruction/response pairs
- Quality filtering — Removed entries <100 chars, table-heavy fragments, OCR artifacts
- Relevance filtering — NIST data reduced from 424,729 to 72,000 relevant to 7,000 sampled
- Deduplication — Exact dedup (xxhash) + near-dedup (MinHash LSH, Jaccard 0.8)
- Validation split — 80/20 stratified split maintaining source distribution
Training Configuration
| Parameter | Value |
|---|---|
| Epochs | 3 |
| Learning Rate | 5e-5 (cosine decay) |
| Optimizer | 8-bit AdamW |
| Batch Size | 1 (effective 16 with gradient accumulation) |
| Gradient Accumulation | 16 |
| LoRA Rank | 8 |
| LoRA Alpha | 16 |
| LoRA Target Modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Max Sequence Length | 2048 |
| Quantization (Base) | 4-bit NF4 |
| Precision | bf16 |
Evaluation Results
| Metric | Value |
|---|---|
| Final Eval Loss | 1.004 |
| Training Steps | 2,520 |
The 72B model achieved the lowest eval loss across all four model sizes, demonstrating the strongest compliance reasoning capabilities in the suite.
Intended Uses
- Complex Multi-Framework Analysis — Cross-reference controls across CMMC, NIST 800-53, HIPAA, and DFARS simultaneously
- SSP Generation — Draft comprehensive System Security Plan control descriptions with detailed NIST/CMMC citations
- Gap Analysis — Identify controls required for specific CMMC levels with nuanced implementation guidance
- Assessment Prep — Generate detailed evidence checklists and assessment objective narratives
- Cross-Framework Mapping — Map controls between frameworks with full context and rationale
- Policy Drafting — Create thorough policies aligned to specific CMMC practices
- DFARS Clause Analysis — Deep analysis of contract-level compliance requirements
Limitations
- Not a substitute for qualified compliance professionals. This model is a tool to accelerate compliance work, not replace human judgment.
- Knowledge cutoff. The model's knowledge is based on training data available at the time of fine-tuning. Always verify against current published frameworks.
- Hardware requirements. The 72B model requires significant resources (48+ GB VRAM or 64+ GB RAM). For less capable hardware, consider the 7B, 14B, or 32B variants.
- No retrieval augmentation. The model generates responses from trained knowledge only — it does not search or retrieve external documents at inference time.
- Citation accuracy. While the model generally cites correct control numbers and framework sections, always verify specific citations against authoritative sources.
Out-of-Scope Uses
- Legal advice. This model does not provide legal opinions on compliance status.
- Automated compliance certification. CMMC certification requires human assessors (C3PAOs).
- Processing actual CUI/ITAR data. The model itself does not process or store sensitive data, but users should follow their organization's data handling policies.
Hardware Requirements
| Mode | GPU (VRAM) | CPU-Only (RAM) | Storage |
|---|---|---|---|
| Inference | 48 GB | 64 GB | 50 GB |
Supported OS: Linux, macOS, Windows (WSL2)
The Model Suite
This is the 72B model — the most capable option for complex multi-framework compliance analysis. The full suite includes:
| Model | Parameters | GGUF Size | Best For |
|---|---|---|---|
| cmmc-expert-7b | 7.6B | 5.1 GB | Quick lookups, day-to-day queries |
| cmmc-expert-14b | 14.7B | ~10 GB | Detailed analysis, multi-control reasoning |
| cmmc-expert-32b | 32.5B | 18.5 GB | Deep gap assessments, SSP drafting |
| cmmc-expert-72b | 72.7B | 44.2 GB | Complex multi-framework analysis |
Source Code
Full pipeline code, training configuration, and evaluation methodology: github.com/NathanMaine/cmmc-compliance-ai-model
Known Issues
- Superseded by v2.0 — This version targets only 4 of 7 transformer modules and was trained on a smaller dataset (13,434 examples). v2.0 improves on both fronts with expanded LoRA coverage and 40% more training data. Use v2.0 unless you have a specific reason to use v1.0.
- Limited cross-framework mapping — May struggle with nuanced mappings between overlapping frameworks (e.g., NIST 800-171 to CMMC practice IDs) compared to later versions.
Citation
@misc{maine2025cmmcexpert,
title={CMMC Expert: Fine-Tuned Language Models for Cybersecurity Compliance},
author={Nathan Maine},
year={2025},
url={https://github.com/NathanMaine/cmmc-compliance-ai-model}
}
Contact
- Author: Nathan Maine
- Website: nathanmaine.com
- LinkedIn: linkedin.com/in/nathanmaine
- Email: nmaine@gmail.com
- Downloads last month
- 49
4-bit
Model tree for Nathan-Maine/cmmc-expert-72b
Collection including Nathan-Maine/cmmc-expert-72b
Evaluation results
- Eval Loss (Final)self-reported1.004