From 63090831cd216c8680ada91e0bf89b76d0cdd845 Mon Sep 17 00:00:00 2001 From: Tab Atkins-Bittner Date: Thu, 12 Dec 2024 10:17:03 -0800 Subject: [PATCH] More explicitly use and reference a cut point, rather than infallibility. --- SPEC.md | 22 ++++++++++++++-------- 1 file changed, 14 insertions(+), 8 deletions(-) diff --git a/SPEC.md b/SPEC.md index b0c55d8..1a8e67a 100644 --- a/SPEC.md +++ b/SPEC.md @@ -855,7 +855,7 @@ value := type? node-space* (string | number | keyword) type := '(' node-space* string node-space* ')' // Strings -string := identifier-string | quoted-string | raw-string +string := identifier-string | quoted-string | raw-string ¶ identifier-string := unambiguous-ident | signed-ident | dotted-ident unambiguous-ident := ((identifier-char - digit - sign - '.') identifier-char*) - disallowed-keyword-strings @@ -927,14 +927,20 @@ Specifically: characters using hex values (`\u{FEFF}`), and for escaping `\` itself (`\\`). * `*` is used for "zero or more", `+` is used for "one or more", and `?` is - used for "zero or one". -* `*?` (used only in raw strings) indicates a *non-greedy* match. - It also indicates *infallibility*, with a scope of the `string` production: - once it successfully matches enough characters to satisfy that production - the first time, it is not allowed to backtrack and continue matching further, - even if that results in a parse failure. + used for "zero or one". Per standard regex semantics, `*` and `+` are *greedy*; + they match as many instances as possible without failing the match. +* `*?` (used only in raw strings) indicates a *non-greedy* match; + it matches as *few* instances as possible without failing the match. +* `¶` is a *cut point*. It always matches and consumes no characters, + but once matched, the parser is not allowed to backtrack past that point in the source. + If a parser would rewind past the cut point, it must instead fail the overall parse, + as if it had run out of options. + (This is only used with the `raw-string` production, + to ensure the first instance of the appropriate closing quote sequence + is guaranteed to be the end of the raw string, + rather than allowing it to potentially consume more of the document unexpectedly.) * `()` can be used to group matches that must be matched together. -* `a | b` means `a or b`, whichever matches first. If multipe items are before +* `a | b` means `a or b`, whichever matches first. If multiple items are before a `|`, they are a single group. `a b c | d` is equivalent to `(a b c) | d`. * `[]` are used for regex-style character matches, where any character between the brackets will be a single match. `\` is used to escape `\`, `[`, and