More explicitly use and reference a cut point, rather than infallibility.

This commit is contained in:
Tab Atkins-Bittner 2024-12-12 10:17:03 -08:00
parent b5e8aaf035
commit 63090831cd
1 changed files with 14 additions and 8 deletions

22
SPEC.md
View File

@ -855,7 +855,7 @@ value := type? node-space* (string | number | keyword)
type := '(' node-space* string node-space* ')' type := '(' node-space* string node-space* ')'
// Strings // Strings
string := identifier-string | quoted-string | raw-string string := identifier-string | quoted-string | raw-string
identifier-string := unambiguous-ident | signed-ident | dotted-ident identifier-string := unambiguous-ident | signed-ident | dotted-ident
unambiguous-ident := ((identifier-char - digit - sign - '.') identifier-char*) - disallowed-keyword-strings unambiguous-ident := ((identifier-char - digit - sign - '.') identifier-char*) - disallowed-keyword-strings
@ -927,14 +927,20 @@ Specifically:
characters using hex values (`\u{FEFF}`), and for escaping `\` itself characters using hex values (`\u{FEFF}`), and for escaping `\` itself
(`\\`). (`\\`).
* `*` is used for "zero or more", `+` is used for "one or more", and `?` is * `*` is used for "zero or more", `+` is used for "one or more", and `?` is
used for "zero or one". used for "zero or one". Per standard regex semantics, `*` and `+` are *greedy*;
* `*?` (used only in raw strings) indicates a *non-greedy* match. they match as many instances as possible without failing the match.
It also indicates *infallibility*, with a scope of the `string` production: * `*?` (used only in raw strings) indicates a *non-greedy* match;
once it successfully matches enough characters to satisfy that production it matches as *few* instances as possible without failing the match.
the first time, it is not allowed to backtrack and continue matching further, * `¶` is a *cut point*. It always matches and consumes no characters,
even if that results in a parse failure. but once matched, the parser is not allowed to backtrack past that point in the source.
If a parser would rewind past the cut point, it must instead fail the overall parse,
as if it had run out of options.
(This is only used with the `raw-string` production,
to ensure the first instance of the appropriate closing quote sequence
is guaranteed to be the end of the raw string,
rather than allowing it to potentially consume more of the document unexpectedly.)
* `()` can be used to group matches that must be matched together. * `()` can be used to group matches that must be matched together.
* `a | b` means `a or b`, whichever matches first. If multipe items are before * `a | b` means `a or b`, whichever matches first. If multiple items are before
a `|`, they are a single group. `a b c | d` is equivalent to `(a b c) | d`. a `|`, they are a single group. `a b c | d` is equivalent to `(a b c) | d`.
* `[]` are used for regex-style character matches, where any character between * `[]` are used for regex-style character matches, where any character between
the brackets will be a single match. `\` is used to escape `\`, `[`, and the brackets will be a single match. `\` is used to escape `\`, `[`, and