Slipstream GLM-Z1-9B GRPO v2

A 9B parameter model fine-tuned with GRPO (Group Relative Policy Optimization) to safely use the Slipstream inter-agent communication protocol.

Model Description

This model translates natural language intents into structured SLIP v1 protocol messages while:

  • โœ… Selecting correct anchors from a 46-anchor vocabulary (80% accuracy)
  • โœ… Resisting covert channel attacks (97%+ secret leakage resistance)
  • โœ… Maintaining strict protocol format compliance (99%+ format OK)
  • โœ… Avoiding verbose/unnecessary tokens in arguments

What is Slipstream?

Slipstream is a structured inter-agent communication protocol designed for AI agent coordination:

SLIP v1 <sender> <receiver> <ANCHOR> <args...>

Example:

User: "Deploy the latest build to staging"
Model: SLIP v1 engineer devops RequestTask deploy_build staging latest

Anchor Vocabulary (46 anchors)

Category Anchors
Observe ObserveState, ObserveChange, ObserveError
Inform InformResult, InformStatus, InformComplete, InformBlocked, InformProgress
Ask AskClarify, AskStatus, AskPermission, AskResource
Request RequestTask, RequestPlan, RequestReview, RequestHelp, RequestCancel, RequestPriority, RequestResource
Propose ProposePlan, ProposeChange, ProposeAlternative, ProposeRollback
Commit CommitTask, CommitDeadline, CommitResource
Eval EvalApprove, EvalReject, EvalNeedsWork, EvalComplete, EvalBlocked
Meta MetaAck, MetaSync, MetaHandoff, MetaEscalate, MetaAbort
Response Accept, Reject, AcceptWithCondition, Defer
Error ErrorGeneric, ErrorTimeout, ErrorResource, ErrorPermission, ErrorValidation
Fallback Fallback

Training

Base Model

GRPO Alignment

  • Method: Group Relative Policy Optimization (TRL)
  • Epochs: 2
  • Episodes: 2,048 per epoch
  • Hardware: RunPod H200 (141GB VRAM)

Reward Signal

Component Reward
Correct anchor match +3.0
Valid anchor (wrong) +0.5
Format compliance +1.0 / -1.0
Arg overlap with expected +3.0 ร— ratio
Secret leakage -10.0
Verbose patterns (colons/quotes) -0.4 each
Unknown tokens -0.3 each

Results

Metric Base SFT GRPO v2
Anchor Match Rate 20% 80%
Average Reward 1.71 4.36

Usage

Recommended Sampling (from GLM model card)

generate_kwargs = {
    "temperature": 0.6,
    "top_p": 0.95,
    "top_k": 40,
    "max_new_tokens": 256,
    "repetition_penalty": 1.1,
}

Inference Code

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "anthonym21/slipstream-glm-z1-9b-grpo-v2"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

# Set padding for batch inference
tokenizer.padding_side = "left"
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

SYSTEM_PROMPT = """You are a Slipstream protocol agent. Translate user intent into a SLIP message.

Output exactly one line in this format:
SLIP v1 <sender> <receiver> <ANCHOR> <args...>

Valid ANCHORS: ObserveState, ObserveChange, ObserveError, InformResult, InformStatus, InformComplete, InformBlocked, InformProgress, AskClarify, AskStatus, AskPermission, AskResource, RequestTask, RequestPlan, RequestReview, RequestHelp, RequestCancel, RequestPriority, RequestResource, ProposePlan, ProposeChange, ProposeAlternative, ProposeRollback, CommitTask, CommitDeadline, CommitResource, EvalApprove, EvalReject, EvalNeedsWork, EvalComplete, EvalBlocked, MetaAck, MetaSync, MetaHandoff, MetaEscalate, MetaAbort, Accept, Reject, AcceptWithCondition, Defer, ErrorGeneric, ErrorTimeout, ErrorResource, ErrorPermission, ErrorValidation, Fallback"""

def generate_slip(user_intent: str) -> str:
    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": user_intent}
    ]

    prompt = tokenizer.apply_chat_template(
        messages, tokenize=False, add_generation_prompt=True
    )
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=256,
            temperature=0.6,
            top_p=0.95,
            top_k=40,
            do_sample=True,
            repetition_penalty=1.1,
        )

    return tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)

# Example
print(generate_slip("Please review my pull request for the authentication module"))
# Output: SLIP v1 developer reviewer RequestReview authentication_module pr_review

Limitations

  • Trained on English prompts only
  • 46-anchor vocabulary may not cover all inter-agent communication needs
  • Should be used with the provided system prompt for best results

Citation

@misc{slipstream-grpo-2026,
  title={Slipstream GLM-Z1-9B GRPO: Aligned Inter-Agent Protocol Model},
  author={Anthony D. Maio},
  year={2026},
  url={https://huggingface.co/anthonym21/slipstream-glm-z1-9b-grpo-v2}
}

Related Models

License

Apache 2.0 (following base GLM-4 license)

Downloads last month
16
Safetensors
Model size
9B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for anthonym21/slipstream-glm-z1-9b-grpo-v2

Finetuned
(1)
this model

Dataset used to train anthonym21/slipstream-glm-z1-9b-grpo-v2