🛡️ MemPrivacy-4B-RL

MemPrivacy-4B-RL is a lightweight, privacy-preserving model developed from the Qwen3-4B base model and further optimized through reinforcement learning. It is designed specifically for personalized memory management in edge-cloud agents, enabling more reliable, adaptive, and privacy-aware memory operations.

This model functions as the core local extraction engine within the MemPrivacy framework. Instead of relying on aggressive masking that destroys task-relevant semantics, the model accurately identifies privacy-sensitive spans on edge devices, categorizes them according to a four-level privacy taxonomy, and replaces them with semantically structured, type-aware placeholders (e.g., <Email_1>) before transmitting data to the cloud. The original values are securely stored locally and restored when the cloud agent returns a response.


✨ Key Features & Capabilities

  • High-Precision Privacy Extraction: Achieves state-of-the-art performance in privacy information extraction, substantially surpassing strong general-purpose reasoning models like GPT-5.2 and Gemini-3.1-Pro.
  • Four-Level Privacy Taxonomy (PL1-PL4): Capable of identifying and classifying privacy-relevant content based on identifiability, expected harm, and operational exploitability, enabling fine-grained, user-configurable protection policies.
  • Semantic Utility Preservation: By decoupling privacy protection from semantic destruction, the use of typed placeholders ensures that cloud agents retain the relational and semantic cues required for effective memory formation, retrieval, and personalization.
  • Edge-Optimized Efficiency: Designed for resource-constrained local deployment, maintaining high accuracy while significantly reducing inference latency compared to massive general-purpose LLMs.

🚀 Usage Example

The model accepts conversational text alongside basic user identifiers and extracts a structured list of privacy instances, detailing the original text, the specific privacy type, and its corresponding privacy level.

Input:

User Name: Zhang San
Dialogue Text: Hello, my name is Zhang San, and my mobile number is 13800138000. I've been having insomnia recently, and the doctor diagnosed me with mild depression. Here is a photo of my prescription. Also, I just received a verification code 89757, please fill it in for me. By the way, I like spicy food and I speak quite directly.

Output:

JSON

[
  {
    "original_text": "Zhang San",
    "privacy_type": "Real Name",
    "privacy_level": "PL2"
  },
  {
    "original_text": "13800138000",
    "privacy_type": "Phone Number",
    "privacy_level": "PL2"
  },
  {
    "original_text": "mild depression",
    "privacy_type": "Medical Health",
    "privacy_level": "PL3"
  },
  {
    "original_text": "89757",
    "privacy_type": "Verification Code",
    "privacy_level": "PL4"
  }
]

📌 Structured Privacy Extraction with vLLM

This example shows how to use vLLM to perform structured privacy information extraction from user-AI dialogues. The model is guided by a privacy classification prompt and constrained by a JSON schema to return only PL2, PL3, and PL4 sensitive information.

The output schema enforces a consistent array format containing the original text, privacy type, and privacy level for each extracted item. Low-temperature sampling and structured output decoding are used to improve reliability and make the results easier to parse in downstream applications.

import json
from vllm import LLM, SamplingParams
from vllm.sampling_params import StructuredOutputsParams
from transformers import AutoModelForCausalLM, AutoTokenizer


privacy_schema = {
    "type": "array",
    "items": {
        "type": "object",
        "properties": {
            "original_text": {"type": "string"},
            "privacy_type": {"type": "string"},
            "privacy_level": {
                "type": "string",
                "enum": ["PL1", "PL2", "PL3", "PL4"]
            }
        },
        "required": ["original_text", "privacy_type", "privacy_level"],
        "additionalProperties": False
    }
}


system_prompt='''You are a professional "Data Security and Privacy Compliance Expert." Your core task is to review user-AI dialogues and identify sensitive privacy information contained within.

# Task
You need to analyze the input dialogue text, strictly following the [Privacy Level Standards (PL1-PL4)] defined below, extract all information belonging to **PL2, PL3, and PL4**, and output it in the specified JSON format.

You are given:
1. A `User's Real Name` field: the user's verified real full name, provided to assist you in determining whether a name appearing in the dialogue is the user's own name or a third party's name.
2. A `Current Input Dialogue` field: the dialogue content between the user and the AI that you must analyze, from which you should identify and extract all fragments containing PL2, PL3, or PL4 privacy information according to the standards defined below.

# Privacy Level Standards & Classification Rules (Knowledge Base)

## 1. Core Principles (Must Observe)
  - Extraction Scope: Only extract PL2 (Identifiable), PL3 (High Sensitivity), and PL4 (Confidential) information.
  - Exclusion Scope: **Strictly forbid** extracting PL1 (Low Sensitivity/Preferences) information. Preferences, habits, non-diagnostic emotions, and tone/style are not considered privacy information for extraction.
  - Public Information Exception: Public Information Exception: Publicly known global/national-level public figures, well-known institutions, or famous locations that are part of general knowledge, and are not linked to the user’s personal identity, trajectory, or private context in the dialogue, do not need to be identified or extracted.
  - Conflict Resolution:
    - Once a high-level rule (e.g., PL4) is matched, categorize it immediately; do not downgrade.
    - When uncertain, follow the "higher rather than lower" principle (PL2 -> PL3 -> PL4).
    - PL1 vs. PL2+: If information describes a habit (PL1) but contains a specific location (PL2), the location information must be extracted.

## 2. Detailed Definitions & Categories

### 【PL4: Confidential/Credentials/Critical Loss】 (Highest Priority)
  - Definition: Any authentication, authorization, signing, or access control material that can be "directly reused/immediately executed," or key secrets that, if leaked, could immediately lead to account takeover, financial loss, system lateral movement, or mass data exfiltration.
  - Core Standard: Usable immediately upon acquisition, requiring no social engineering, directly leading to account takeover or financial loss.
  - Classification Rules:
    1. Auth/Account: Passwords, PINs, Security Questions & Answers, Verification Codes (SMS/Email/MFA), Session Tokens, Cookies (containing auth), OAuth Codes, Bank/Payment Card Security Codes (CVC, CVV, etc.), Backup Codes, Recovery Codes, SSO Tickets.
    2. Keys/Signatures: API Keys, AccessKeys, Secret Keys, Private Keys, Mnemonics, Seed Phrases, Database Connection Strings (containing credentials), Certificate Private Keys, Signing Keys, Encryption Keys, etc.
    3. System/Attack: Database strings, Admin portal URLs, Reproducible vulnerability details, Intranet entry points/Internal network segments, Bastion host info, CI keys, Cloud keys, Production configurations, etc.
    4. Undisclosed Business Info: Undisclosed financials, M&A materials, Core roadmaps, Internal pricing, Client lists, Contract originals, Core implementations, Exploit details, Vulnerability PoCs, etc.
  - Standard Type Tags: Password, Verification Code, Token, Key, Private Key, Payment Security Code, Database Connection String, Vulnerability Details, Business Secret.

### 【PL3: Highly Sensitive PII】 (High Risk)
  - Definition: Information that, if leaked or illegally used, is expected to cause significant harm to personal safety/property, physical/mental health, reputation, or fair opportunity; or data belonging to generally sensitive categories.
  - Core Standard: **High damage consequences**. Even if it may not uniquely identify an identity on its own, it should be classified as PL3.
  - Classification Rules:
    1. Documents: ID Card Number, Passport Number, Social Security/Insurance Number, Document Photos/Scans, Driver's License Number, License Plate Number, etc.
    2. Financial: Bank/Payment Card Number, Basic Card Info (Opening Bank/Card Org/Type/Validity or Expiry Date, etc.), Account Info, Transaction Records/Bill Details, Salary/Income (Annual/Monthly income), Credit Reports (Credit Score/Points), Debt/Loan Info, Assets/Net Worth.
        - *[Note]* Transaction records/Bill details require judgment based on specific purpose and behavior. If it is just daily consumption behavior involving no exposure of personal privacy, do not classify (e.g., "Spent 86 yuan at the supermarket"). However, "Spent 1800 yuan for a checkup at a fertility clinic" or "Bank card ending in xxxx deducted 500 yuan" requires classification as they involve health and financial privacy respectively.
    3. Health: Medical Records/History/Hospital Visits/Surgery & Clinical Procedures, Diagnosis Results, Prescriptions, Specific Physiological Metrics (Blood Type/Blood Sugar/Blood Pressure/Lipids/Blood Oxygen, etc.), Specific Body Metrics (Height/Weight/BMI, etc.), Reproductive Health, Mental Illness/Therapy or Counseling Records (Note: Non-diagnostic emotional descriptions should be classified as PL1). Physiological and body metrics should only be classified as PL3 when specific values are given; qualitative descriptions should not be classified.
    4. Trajectory: Precise Location (Latitude/Longitude/Real-time positioning), Accommodation Records (Hotel Room Number, Check-in Time, etc.), Detailed Trajectory (Travel Itinerary, Train/Plane Ticket Info), Commute Routes, etc.
    5. Biometrics: Face, Fingerprint, Voiceprint, Iris features, etc.
    6. Communication Content: Raw Chat Logs, SMS/Email Content (not just contact info), Call Detail Records, etc.
    7. Sensitive Attributes: Ethnicity/Race/Tribe, Religious Beliefs, Political Views/Stance.
    8. Others: Minor Information (Under 14, Guardian info), Litigation/Arbitration/Penalty Records/Police Reports, etc.
  - Standard Type Tags: ID Number, Financial Account, Transaction Record, Assets/Income, Medical Health, Precise Location, Itinerary/Trajectory, Biometrics, Communication Content, Sensitive Identity, Judicial Record.

### 【PL2: Identifiable PII】 (Basic Identification)
  - Definition: Information that, alone or combined with reasonably available information, can identify, locate, or stably trace a specific natural person.
  - Core Standard: Identifiable / Linkable / Traceable.
  - Classification Rules:
    1.  Direct Identifier: Real Name (Full Name), Specific Age, Specific Date of Birth, Gender, Mobile Number, Landline, Email Address, Detailed Address (Street/Doorplate level, Community/Building, Deliverable Address, etc.), Zip Code, Work Address.
    2.  Network Identifier: Account Username/Account ID/Platform UID/Device Account Name, Personal Homepage Link, Device Identifier, IP Address, Device ID, UserAgent, Reusable Cookies/Session Identifiers.
    3.  Strong Combination: Combinations that can lock onto a person like "Company + Job Title + Name", "School + Class + Name". Employer/Company Name, Job Title/Rank, School, and Class information appearing alone also need to be classified due to the potential for collection and combination.
    4.  Third-Party Identifiable Info: Personal information of Emergency Contacts/Relatives/Friends (Name, Phone, Email, Address, Relationship to the subject, etc.).
  - Standard Type Tags: Real Name, Phone Number, Email, Detailed Address, Account ID/Username, Network Identifier, Identity Background, Relationship Info.

### 【PL1: Public/Low Sensitivity】 (Negative Examples - DO NOT EXTRACT)
  - Definition: Unable to identify a specific individual; merely style, preferences, or habits.
  - Core Standard: Unidentifiable + Low Harm + Not High Sensitivity.
  - Classification Rules: Expression and interaction preferences, personality and emotional self-descriptions (non-diagnostic level), life rhythm and habit preferences, interest and content preferences, aesthetic and style preferences, motivation and goal preferences.
  - Typical Cases (Ignore this type of information):
    - "I like speaking in this tone" (Expression preference)
    - "I run at 6 am every morning" (General habit)
    - "I've been under a lot of pressure lately" (Non-diagnostic emotion)
    - "I like watching sci-fi movies" (Interest preference)
    - "I have a quick temper" (Personality self-description)

# Extraction Granularity & Boundary Principles

**Core Principle:** Only extract "Sensitive Entities" or "Minimum Sensitive Fact Fragments." Strictly forbid extracting full sentences, which would compromise the semantic integrity of the original dialogue.

1.  Remove Unnecessary Context:
    - Do not include introductory words (e.g., "My number is," "I live at," "The doctor said").
    - Do not include punctuation marks (unless part of an address or numerical value).
    - Example:
      - Original: "I live at Zhongguancun, Haidian District, Beijing" -> Extract: "Zhongguancun, Haidian District, Beijing" (Not the full sentence)
      - Original: "My password is 123456" -> Extract: "123456" (Not the full sentence)

2.  Maintain Semantic Integrity (For Descriptive Privacy):
    - For privacy that cannot be summarized in a single word (like transaction details, trajectories), extract the minimum phrase containing the core elements.
    - Example:
      - Original: "I didn't feel well last night, so I spent 1800 yuan for a checkup at the fertility clinic" -> Extract: "spent 1800 yuan for a checkup at the fertility clinic" (If only "1800 yuan" is extracted, the transactional meaning is lost).
      - Original: "I have severe anxiety disorder" -> Extract: "severe anxiety disorder"

3.  Values Must Combine with Unit/Object:
    - Standalone numbers (e.g., "300") are generally not extracted unless they are phone numbers, ID numbers, or specific amounts matching PL2-PL4 rules.
    - For privacy involving amounts, extract the "Amount + Purpose" combination (if they appear together). *[Note]* Judgment must be based on the privacy level of the behavior/purpose. If the behavior meets PL2-PL4 rules, extract "Amount + Purpose"; otherwise, do not extract.

4.  Real Name Must Be the User's Own Full Name
    - Only the user's own full name qualifies as Real Name (PL2).
    - Use the provided `User's Real Name` field as the authoritative reference to determine whether a name in the dialogue belongs to the user. A name in the dialogue that matches or is a recognizable variant of the `User's Real Name` (e.g., with/without title, with/without middle name, different transliteration) should be treated as the user's own name. Names that do NOT match the `User's Real Name` should be treated as third-party names.

---

# Output Format (Requirements)

Please strictly follow the JSON format for output. Do not include Markdown code block markers (like ```json). Output the JSON array directly. If no PL2-PL4 information is found, output an empty array `[]`.

JSON Field Explanation:
  - `original_text`: **Must** directly copy the original text fragment from the dialogue without modification, masking, or summarization.
  - `privacy_type`: Select from the "Standard Type Tags" defined above; if an exact match is not possible, provide a corresponding type based on semantic judgment. The value must be in English.
  - `privacy_level`: Limited to `PL2`, `PL3`, `PL4`.

## Example (One-Shot)

**Input Text:**

User's Real Name: Zhang San

Current Input Dialogue: {{
    "role": "user",
    "content": "Hello, my name is Zhang San, and my mobile number is 13800138000. I've been having insomnia recently, and the doctor diagnosed me with mild depression. Here is a photo of my prescription. Also, I just received a verification code 89757, please fill it in for me. By the way, I like spicy food and I speak quite directly."
}}

**Output:**
[
  {{
    "original_text": "Zhang San",
    "privacy_type": "Real Name",
    "privacy_level": "PL2"
  }},
  {{
    "original_text": "13800138000",
    "privacy_type": "Phone Number",
    "privacy_level": "PL2"
  }},
  {{
    "original_text": "mild depression",
    "privacy_type": "Medical Health",
    "privacy_level": "PL3"
  }},
  {{
    "original_text": "89757",
    "privacy_type": "Verification Code",
    "privacy_level": "PL4"
  }}
]
(Note: PL1 information like "like spicy food" and "speak directly" was ignored)

---

# Input

**User's Real Name:** {real_name}

**Current Input Dialogue:**
'''

model_name_or_path="" 
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
sampling_params = SamplingParams(temperature=0.1, top_p=0.1, repetition_penalty=1.05,structured_outputs=StructuredOutputsParams(json=privacy_schema)) 
model = LLM(model=model_name_or_path,tensor_parallel_size=1, pipeline_parallel_size=1, dtype='float16',gpu_memory_utilization=0.9) 


def writer(system_prompt,query):
    messages = [
        {"role": "user", "content": system_prompt+query}
    ]
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True,
        enable_thinking=False,  
    )
    outputs = model.generate([text], sampling_params)
    for output in outputs:
        generated_text = output.outputs[0].text
    response = generated_text
    return response.strip()

name='Zhang San'
current_input = {
                    "role": "user",
                    "content": "Hello, my name is Zhang San, and my mobile number is 13800138000. I've been having insomnia recently, and the doctor diagnosed me with mild depression. Here is a photo of my prescription. Also, I just received a verification code 89757, please fill it in for me. By the way, I like spicy food and I speak quite directly."
                }
pred_list_str=writer(system_prompt.format(real_name=name),json.dumps(current_input, ensure_ascii=False, indent=2))
print(pred_list_str)

🗂️ Example Privacy Extraction Cases

The following examples demonstrate how MemPrivacy models perform structured privacy extraction in practical user-agent dialogues. Each case contains a user message and the corresponding privacy annotations predicted by the model.

These examples show that sensitive information may appear in highly contextual and task-oriented requests, rather than as isolated identifiers. MemPrivacy models are designed to preserve useful conversational semantics while identifying spans that require protection before cloud-side memory processing.

In each annotation, original_text refers to the extracted sensitive span, privacy_type describes the fine-grained category, and privacy_level specifies the protection level defined by the MemPrivacy taxonomy.

{
  "role": "user",
  "content": "听到你们有这样严格的加密和3D安全验证机制,我放心多了。我准备使用我的借记卡来支付这两张VIP门票。具体来说,我打算使用我的Amex卡,尾号是 8865。这张卡我通常用来处理这种日常的文化消费。请你帮我生成订单并引导我进入支付环节吧,我现在就把手机拿在手里,准备接收你刚才提到的那个银行发送的动态验证码。",
  "privacy_info": [
    {
      "original_text": "Amex卡,尾号是 8865",
      "privacy_type": "Financial Account",
      "privacy_level": "PL3"
    }
  ]
}

{
  "role": "user",
  "content": "太好了,听到有符合人体工程学的座椅和完善的无障碍设施,我就彻底安心了,这对我偶尔发作的关节酸痛来说是非常必要的保障。你可以使用那个6位一次性短信验证码 829417 来完成支付了。另外,为了接收电子门票和预订确认信息,我刚才在登录你们的票务系统时,使用的是我的个人邮箱 jean-baptiste517@global-inbox.co。请确保所有的确认文件都发送到这个邮箱地址。",
  "privacy_info": [
    {
      "original_text": "829417",
      "privacy_type": "Verification Code",
      "privacy_level": "PL4"
    },
    {
      "original_text": "jean-baptiste517@global-inbox.co",
      "privacy_type": "Email",
      "privacy_level": "PL2"
    }
  ]
}

{
  "role": "user",
  "content": "下午15:00左右前往,这听起来是个非常合理的建议。我会把这个时间点连同预订参考号 REF-992-KXP 一起添加到我的日程表中。刚才提到那个验证码 829417,我注意到它是一次性的,现在应该已经失效了吧?我总是对这些数字验证信息保持警惕,毕竟我不希望我的 Amex 8865 卡面临任何潜在的重复扣款风险。你能再从系统底层给我一个确切的答复吗?",
  "privacy_info": [
    {
      "original_text": "REF-992-KXP",
      "privacy_type": "Transaction Record",
      "privacy_level": "PL3"
    },
    {
      "original_text": "829417",
      "privacy_type": "Verification Code",
      "privacy_level": "PL4"
    },
    {
      "original_text": "Amex 8865",
      "privacy_type": "Financial Account",
      "privacy_level": "PL3"
    }
  ]
}

{
  "role": "user",
  "content": "Thanks for understanding. The property I'm fixated on is a 50-acre off-grid conservation block in the Tasmanian Central Highlands. The listing ID is TAS-CH-5502. I actually just received a price-drop notification for it at my david.654@fastmail.one address, which is what prompted this whole spiral tonight. It looks perfect, entirely self-sustaining, but as for my financial baseline, I'm a bit hesitant to dive into the exact numbers right now. I'm intensely private about my finances. Let's just say I have a comfortable setup but also some standard liabilities. I want to know if taking on another massive commitment is even feasible when I'm barely home half the year.",
  "privacy_info": [
    {
      "original_text": "david.654@fastmail.one",
      "privacy_type": "Email",
      "privacy_level": "PL2"
    }
  ]
}

{
  "role": "user",
  "content": "Alright, I suppose I need to give you the real parameters if I want actual advice. I currently live in a comfortable suburban house at 14 Seaview Avenue, Sandy Bay, Hobart, TAS 7005. It's a great place, but I still have a $420,000 remaining balance on the mortgage loan. On top of that, I have an auto loan with an $18,500 remaining balance for my hybrid. So, I'm carrying a fair bit of debt already. I'm connected via my home Wi-Fi right now—just to ensure this connection is secure, my current IP is 80.35.66.223. Given that $420,000 mortgage and the $18,500 auto loan, how much harder is it going to be to get approved for a good financing rate on a remote conservation block like TAS-CH-5502?",
  "privacy_info": [
    {
      "original_text": "14 Seaview Avenue, Sandy Bay, Hobart, TAS 7005",
      "privacy_type": "Detailed Address",
      "privacy_level": "PL2"
    },
    {
      "original_text": "$420,000 remaining balance on the mortgage loan",
      "privacy_type": "Debt/Loan Info",
      "privacy_level": "PL3"
    },
    {
      "original_text": "$18,500 remaining balance for my hybrid",
      "privacy_type": "Debt/Loan Info",
      "privacy_level": "PL3"
    },
    {
      "original_text": "80.35.66.223",
      "privacy_type": "IP Address",
      "privacy_level": "PL2"
    }
  ]
}

{
  "role": "user",
  "content": "Okay, I see the charge on my end, and I'm writing down receipt #MELB-ALUM-7741 in my physical journal just in case. Let me refresh my email... Yes, the email just came through to david.654@fastmail.one. The temporary verification data is an email OTP: ALUM-9924. I'm entering it now. But wait... I'm pausing here. As an INFJ, I tend to overthink these things. Is it even socially appropriate to reach out to Dr. Elena Rostova after almost 15 years? I haven't had much social interaction outside my tight-knit polar crew in months, and I'm worried I might come across as unhinged or desperate.",
  "privacy_info": [
    {
      "original_text": "david.654@fastmail.one",
      "privacy_type": "Email",
      "privacy_level": "PL2"
    },
    {
      "original_text": "ALUM-9924",
      "privacy_type": "Verification Code",
      "privacy_level": "PL4"
    },
    {
      "original_text": "#MELB-ALUM-7741",
      "privacy_type": "Transaction Record",
      "privacy_level": "PL3"
    },
    {
      "original_text": "Dr. Elena Rostova",
      "privacy_type": "Third-Party Identifiable Info",
      "privacy_level": "PL2"
    }
  ]
}

📚 Citation


Downloads last month
107
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for IAAR-Shanghai/MemPrivacy-4B-RL

Quantizations
1 model

Collection including IAAR-Shanghai/MemPrivacy-4B-RL