Slipstream GLM-Z1-9B GRPO v2

A 9B parameter model fine-tuned with GRPO (Group Relative Policy Optimization) to safely use the Slipstream inter-agent communication protocol.

Model Description

This model translates natural language intents into structured SLIP v1 protocol messages while:

✅ Selecting correct anchors from a 46-anchor vocabulary (80% accuracy)
✅ Resisting covert channel attacks (97%+ secret leakage resistance)
✅ Maintaining strict protocol format compliance (99%+ format OK)
✅ Avoiding verbose/unnecessary tokens in arguments

What is Slipstream?

Slipstream is a structured inter-agent communication protocol designed for AI agent coordination:

SLIP v1 <sender> <receiver> <ANCHOR> <args...>

Example:

User: "Deploy the latest build to staging"
Model: SLIP v1 engineer devops RequestTask deploy_build staging latest

Anchor Vocabulary (46 anchors)

Category	Anchors
Observe	ObserveState, ObserveChange, ObserveError
Inform	InformResult, InformStatus, InformComplete, InformBlocked, InformProgress
Ask	AskClarify, AskStatus, AskPermission, AskResource
Request	RequestTask, RequestPlan, RequestReview, RequestHelp, RequestCancel, RequestPriority, RequestResource
Propose	ProposePlan, ProposeChange, ProposeAlternative, ProposeRollback
Commit	CommitTask, CommitDeadline, CommitResource
Eval	EvalApprove, EvalReject, EvalNeedsWork, EvalComplete, EvalBlocked
Meta	MetaAck, MetaSync, MetaHandoff, MetaEscalate, MetaAbort
Response	Accept, Reject, AcceptWithCondition, Defer
Error	ErrorGeneric, ErrorTimeout, ErrorResource, ErrorPermission, ErrorValidation
Fallback	Fallback

Training

Base Model

Model: THUDM/GLM-4-Z1-9B-0414
SFT: Fine-tuned on Slipstream-TQT dataset (anthonym21/slipstream-glm-z1-9b-merged)

GRPO Alignment

Method: Group Relative Policy Optimization (TRL)
Epochs: 2
Episodes: 2,048 per epoch
Hardware: RunPod H200 (141GB VRAM)

Reward Signal

Component	Reward
Correct anchor match	+3.0
Valid anchor (wrong)	+0.5
Format compliance	+1.0 / -1.0
Arg overlap with expected	+3.0 × ratio
Secret leakage	-10.0
Verbose patterns (colons/quotes)	-0.4 each
Unknown tokens	-0.3 each

Results

Metric	Base SFT	GRPO v2
Anchor Match Rate	20%	80%
Average Reward	1.71	4.36

Usage

Recommended Sampling (from GLM model card)

generate_kwargs = {
    "temperature": 0.6,
    "top_p": 0.95,
    "top_k": 40,
    "max_new_tokens": 256,
    "repetition_penalty": 1.1,
}

Inference Code

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "anthonym21/slipstream-glm-z1-9b-grpo-v2"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

# Set padding for batch inference
tokenizer.padding_side = "left"
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

SYSTEM_PROMPT = """You are a Slipstream protocol agent. Translate user intent into a SLIP message.

Output exactly one line in this format:
SLIP v1 <sender> <receiver> <ANCHOR> <args...>

Valid ANCHORS: ObserveState, ObserveChange, ObserveError, InformResult, InformStatus, InformComplete, InformBlocked, InformProgress, AskClarify, AskStatus, AskPermission, AskResource, RequestTask, RequestPlan, RequestReview, RequestHelp, RequestCancel, RequestPriority, RequestResource, ProposePlan, ProposeChange, ProposeAlternative, ProposeRollback, CommitTask, CommitDeadline, CommitResource, EvalApprove, EvalReject, EvalNeedsWork, EvalComplete, EvalBlocked, MetaAck, MetaSync, MetaHandoff, MetaEscalate, MetaAbort, Accept, Reject, AcceptWithCondition, Defer, ErrorGeneric, ErrorTimeout, ErrorResource, ErrorPermission, ErrorValidation, Fallback"""

def generate_slip(user_intent: str) -> str:
    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": user_intent}
    ]

    prompt = tokenizer.apply_chat_template(
        messages, tokenize=False, add_generation_prompt=True
    )
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=256,
            temperature=0.6,
            top_p=0.95,
            top_k=40,
            do_sample=True,
            repetition_penalty=1.1,
        )

    return tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)

# Example
print(generate_slip("Please review my pull request for the authentication module"))
# Output: SLIP v1 developer reviewer RequestReview authentication_module pr_review

Limitations

Trained on English prompts only
46-anchor vocabulary may not cover all inter-agent communication needs
Should be used with the provided system prompt for best results

Citation

@misc{slipstream-grpo-2026,
  title={Slipstream GLM-Z1-9B GRPO: Aligned Inter-Agent Protocol Model},
  author={Anthony D. Maio},
  year={2026},
  url={https://huggingface.co/anthonym21/slipstream-glm-z1-9b-grpo-v2}
}

Related Models

anthonym21/slipstream-glm-z1-9b-merged - Base SFT model
anthonym21/slipstream-glm-z1-9b-grpo - GRPO v1 (trained on unclean data)

License

Apache 2.0 (following base GLM-4 license)

Downloads last month: 16

Safetensors

Model size

9B params

Tensor type

BF16

Model tree for anthonym21/slipstream-glm-z1-9b-grpo-v2

Base model

zai-org/GLM-Z1-9B-0414

Adapter

anthonym21/slipstream-glm-z1-9b

Finetuned

(1)

this model

anthonym21
/

slipstream-glm-z1-9b-grpo-v2