Make the raw-string productions non-greedy, and describe the infallibility (#430)

* Make the raw-string productions non-greedy, and describe the infallibility. Closes #415

* More explicitly use and reference a cut point, rather than infallibility.
This commit is contained in:
Tab Atkins Jr. 2024-12-12 11:08:50 -08:00 committed by GitHub
parent 717e86cb1c
commit d1ceb44f40
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
1 changed files with 17 additions and 6 deletions

23
SPEC.md
View File

@ -855,7 +855,7 @@ value := type? node-space* (string | number | keyword)
type := '(' node-space* string node-space* ')'
// Strings
string := identifier-string | quoted-string | raw-string
string := identifier-string | quoted-string | raw-string
identifier-string := unambiguous-ident | signed-ident | dotted-ident
unambiguous-ident := ((identifier-char - digit - sign - '.') identifier-char*) - disallowed-keyword-strings
@ -872,10 +872,10 @@ escape := ["\\bfnrts] | 'u{' hex-digit{1, 6} '}' | (unicode-space | newline)+
hex-digit := [0-9a-fA-F]
raw-string := '#' raw-string-quotes '#' | '#' raw-string '#'
raw-string-quotes := '"' single-line-raw-string-body '"' | '"""' newline multi-line-raw-string-body newline unicode-space* '"""'
single-line-raw-string-body := '' | (single-line-raw-string-char - '"') single-line-raw-string-char* | '"' (single-line-raw-string-char - '"') single-line-raw-string-char*
raw-string-quotes := '"' single-line-raw-string-body '"' | '"""' newline multi-line-raw-string-body '"""'
single-line-raw-string-body := '' | (single-line-raw-string-char - '"') single-line-raw-string-char*? | '"' (single-line-raw-string-char - '"') single-line-raw-string-char*?
single-line-raw-string-char := unicode - newline - disallowed-literal-code-points
multi-line-raw-string-body := (unicode - disallowed-literal-code-points)*
multi-line-raw-string-body := (unicode - disallowed-literal-code-points)*?
// Numbers
number := keyword-number | hex | octal | binary | decimal
@ -927,9 +927,20 @@ Specifically:
characters using hex values (`\u{FEFF}`), and for escaping `\` itself
(`\\`).
* `*` is used for "zero or more", `+` is used for "one or more", and `?` is
used for "zero or one".
used for "zero or one". Per standard regex semantics, `*` and `+` are *greedy*;
they match as many instances as possible without failing the match.
* `*?` (used only in raw strings) indicates a *non-greedy* match;
it matches as *few* instances as possible without failing the match.
* `¶` is a *cut point*. It always matches and consumes no characters,
but once matched, the parser is not allowed to backtrack past that point in the source.
If a parser would rewind past the cut point, it must instead fail the overall parse,
as if it had run out of options.
(This is only used with the `raw-string` production,
to ensure the first instance of the appropriate closing quote sequence
is guaranteed to be the end of the raw string,
rather than allowing it to potentially consume more of the document unexpectedly.)
* `()` can be used to group matches that must be matched together.
* `a | b` means `a or b`, whichever matches first. If multipe items are before
* `a | b` means `a or b`, whichever matches first. If multiple items are before
a `|`, they are a single group. `a b c | d` is equivalent to `(a b c) | d`.
* `[]` are used for regex-style character matches, where any character between
the brackets will be a single match. `\` is used to escape `\`, `[`, and