RWKV-ScaleLens / test_sample.txt
Jellyfish042's picture
feat: support RWKV model A/B selection and space-ready model handling
f5e1a93
BEGIN TEST
Leading spaces (2) + trailing spaces (2)··
TAB_LITERAL: [START] [END] (这里中间有一个真实的TAB)
Raw escape-like text: \n \r \t \\n \\r \\t \\x00 \\x1f \\x7f \\xff \u0000 \u202E \u200F \u200E
Bytes-ish hex: e5 bd 93 e7 84 b6 | 00 1f 7f ff | 0x00 0x1F 0x7F 0xFF | b"\x00\x1f\x7f\xff"
HTML tags (should render as text, not tags):
<think></think> <think>inner</think> <script>alert('x')</script> <style>body{color:red}</style>
<div class="x" data-x="1 & 2">Hello</div> <span>Span</span> <a href="https://example.com?q=1&x=<tag>">link</a>
<img src=x onerror=alert(1)> <br> <hr> <p>para</p> <table><tr><td>cell</td></tr></table>
Nested-ish: </span><span data-x="</span>">confuse</span>
HTML entities:
&lt;think&gt; &lt;/think&gt; &amp; &quot; &#39; &nbsp; &#10; &#x3C; &#x3E; &#x26;
Markdown-ish:
# H1
## H2
- list item 1
- list item 2
> blockquote
--- (three hyphens)
Languages:
中文 简体/繁體 日本語 かな カタカナ 한국어 العربية עברית हिन्दी ไทย Русский Ελληνικά Español Français Português Türkçe Việt
RTL mix: العربية ABC עברית 123 (注意混排方向)
Combining vs composed:
é (e + combining acute) vs é (single codepoint)
Å (A + combining ring) vs Å
Emoji / ZWJ:
😀 😅 🧠 👩🏽‍💻 🏳️‍🌈 👨‍👩‍👧‍👦 🧑🏾‍🚀 🫠
Zero-width samples (括号里含真实不可见字符):
ZWS(​) ZWNJ(‌) ZWJ(‍) LRM(‎) RLM(‏)
Long line to test wrapping:
human-knowledge approach tends to complicate methods in ways that make them less suited to taking advantage of general methods leveraging computation; therefore human-should-not-wrap-weirdly-here.
Literal backslashes:
C:\path\to\file\name.txt and \\server\share\folder
End.