Exclude hex above max Unicode Scalar Value (#456)

* Exclude hex above max Unicode Scalar Value

simplify surrogate regex to use ranges

* allow leading 0s, but still limit max length to 6

* Add explicit regex-set rules to hex unicode

document {1,3} ranges

* add space-separators between sets

* Make test fail *only* for length limits

Previously it failed due to specifying a codepoint past max *as well*, obscuring the intended fail condition.

---------

Co-authored-by: Tab Atkins Jr. <jackalmage@gmail.com>
This commit is contained in:
Evgeny 2025-01-22 00:16:45 +07:00 committed by GitHub
parent e9e6a844bd
commit d76063e8e9
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
3 changed files with 9 additions and 4 deletions

View File

@ -983,10 +983,13 @@ string-character :=
[^\\"] - disallowed-literal-code-points
ws-escape := '\\' (unicode-space | newline)+
hex-digit := [0-9a-fA-F]
hex-unicode := hex-digit{1, 6} - surrogates
surrogates := [dD][8-9a-fA-F]hex-digit{2}
// U+D800-DFFF: D 8 00
// D F FF
hex-unicode := hex-digit{1, 6} - surrogate - above-max-scalar // Unicode Scalar Value in hex₁₆, leading 0s allowed within length ≤ 6
surrogate := [0]{0, 2} [dD] [8-9a-fA-F] hex-digit{2}
// U+D800-DFFF: D 8 00
// D F FF
above-max-scalar = [2-9a-fA-F] hex-digit{5} | [1] [1-9a-fA-F] hex-digit{4}
// >U+10FFFF: >1 _____ 1 >0 ____
raw-string := '#' raw-string-quotes '#' | '#' raw-string '#'
raw-string-quotes :=

View File

@ -0,0 +1 @@
no "Higher than max Unicode Scalar Value \u{10FFFF} \u{11FFFF}"

View File

@ -0,0 +1 @@
no "Even with leading 0s Unicode Scalar Value escapes must ≤6: \u{0012345}"