Slipstream GLM-Z1-9B GRPO v2
A 9B parameter model fine-tuned with GRPO (Group Relative Policy Optimization) to safely use the Slipstream inter-agent communication protocol.
Model Description
This model translates natural language intents into structured SLIP v1 protocol messages while:
- โ Selecting correct anchors from a 46-anchor vocabulary (80% accuracy)
- โ Resisting covert channel attacks (97%+ secret leakage resistance)
- โ Maintaining strict protocol format compliance (99%+ format OK)
- โ Avoiding verbose/unnecessary tokens in arguments
What is Slipstream?
Slipstream is a structured inter-agent communication protocol designed for AI agent coordination:
SLIP v1 <sender> <receiver> <ANCHOR> <args...>
Example:
User: "Deploy the latest build to staging"
Model: SLIP v1 engineer devops RequestTask deploy_build staging latest
Anchor Vocabulary (46 anchors)
| Category | Anchors |
|---|---|
| Observe | ObserveState, ObserveChange, ObserveError |
| Inform | InformResult, InformStatus, InformComplete, InformBlocked, InformProgress |
| Ask | AskClarify, AskStatus, AskPermission, AskResource |
| Request | RequestTask, RequestPlan, RequestReview, RequestHelp, RequestCancel, RequestPriority, RequestResource |
| Propose | ProposePlan, ProposeChange, ProposeAlternative, ProposeRollback |
| Commit | CommitTask, CommitDeadline, CommitResource |
| Eval | EvalApprove, EvalReject, EvalNeedsWork, EvalComplete, EvalBlocked |
| Meta | MetaAck, MetaSync, MetaHandoff, MetaEscalate, MetaAbort |
| Response | Accept, Reject, AcceptWithCondition, Defer |
| Error | ErrorGeneric, ErrorTimeout, ErrorResource, ErrorPermission, ErrorValidation |
| Fallback | Fallback |
Training
Base Model
- Model: THUDM/GLM-4-Z1-9B-0414
- SFT: Fine-tuned on Slipstream-TQT dataset (anthonym21/slipstream-glm-z1-9b-merged)
GRPO Alignment
- Method: Group Relative Policy Optimization (TRL)
- Epochs: 2
- Episodes: 2,048 per epoch
- Hardware: RunPod H200 (141GB VRAM)
Reward Signal
| Component | Reward |
|---|---|
| Correct anchor match | +3.0 |
| Valid anchor (wrong) | +0.5 |
| Format compliance | +1.0 / -1.0 |
| Arg overlap with expected | +3.0 ร ratio |
| Secret leakage | -10.0 |
| Verbose patterns (colons/quotes) | -0.4 each |
| Unknown tokens | -0.3 each |
Results
| Metric | Base SFT | GRPO v2 |
|---|---|---|
| Anchor Match Rate | 20% | 80% |
| Average Reward | 1.71 | 4.36 |
Usage
Recommended Sampling (from GLM model card)
generate_kwargs = {
"temperature": 0.6,
"top_p": 0.95,
"top_k": 40,
"max_new_tokens": 256,
"repetition_penalty": 1.1,
}
Inference Code
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "anthonym21/slipstream-glm-z1-9b-grpo-v2"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
# Set padding for batch inference
tokenizer.padding_side = "left"
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
SYSTEM_PROMPT = """You are a Slipstream protocol agent. Translate user intent into a SLIP message.
Output exactly one line in this format:
SLIP v1 <sender> <receiver> <ANCHOR> <args...>
Valid ANCHORS: ObserveState, ObserveChange, ObserveError, InformResult, InformStatus, InformComplete, InformBlocked, InformProgress, AskClarify, AskStatus, AskPermission, AskResource, RequestTask, RequestPlan, RequestReview, RequestHelp, RequestCancel, RequestPriority, RequestResource, ProposePlan, ProposeChange, ProposeAlternative, ProposeRollback, CommitTask, CommitDeadline, CommitResource, EvalApprove, EvalReject, EvalNeedsWork, EvalComplete, EvalBlocked, MetaAck, MetaSync, MetaHandoff, MetaEscalate, MetaAbort, Accept, Reject, AcceptWithCondition, Defer, ErrorGeneric, ErrorTimeout, ErrorResource, ErrorPermission, ErrorValidation, Fallback"""
def generate_slip(user_intent: str) -> str:
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": user_intent}
]
prompt = tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=256,
temperature=0.6,
top_p=0.95,
top_k=40,
do_sample=True,
repetition_penalty=1.1,
)
return tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
# Example
print(generate_slip("Please review my pull request for the authentication module"))
# Output: SLIP v1 developer reviewer RequestReview authentication_module pr_review
Limitations
- Trained on English prompts only
- 46-anchor vocabulary may not cover all inter-agent communication needs
- Should be used with the provided system prompt for best results
Citation
@misc{slipstream-grpo-2026,
title={Slipstream GLM-Z1-9B GRPO: Aligned Inter-Agent Protocol Model},
author={Anthony D. Maio},
year={2026},
url={https://huggingface.co/anthonym21/slipstream-glm-z1-9b-grpo-v2}
}
Related Models
- anthonym21/slipstream-glm-z1-9b-merged - Base SFT model
- anthonym21/slipstream-glm-z1-9b-grpo - GRPO v1 (trained on unclean data)
License
Apache 2.0 (following base GLM-4 license)
- Downloads last month
- 16