mirror of https://github.com/kdl-org/kdl.git
Make the raw-string productions non-greedy, and describe the infallibility (#430)
* Make the raw-string productions non-greedy, and describe the infallibility. Closes #415 * More explicitly use and reference a cut point, rather than infallibility.
This commit is contained in:
parent
717e86cb1c
commit
d1ceb44f40
23
SPEC.md
23
SPEC.md
|
|
@ -855,7 +855,7 @@ value := type? node-space* (string | number | keyword)
|
||||||
type := '(' node-space* string node-space* ')'
|
type := '(' node-space* string node-space* ')'
|
||||||
|
|
||||||
// Strings
|
// Strings
|
||||||
string := identifier-string | quoted-string | raw-string
|
string := identifier-string | quoted-string | raw-string ¶
|
||||||
|
|
||||||
identifier-string := unambiguous-ident | signed-ident | dotted-ident
|
identifier-string := unambiguous-ident | signed-ident | dotted-ident
|
||||||
unambiguous-ident := ((identifier-char - digit - sign - '.') identifier-char*) - disallowed-keyword-strings
|
unambiguous-ident := ((identifier-char - digit - sign - '.') identifier-char*) - disallowed-keyword-strings
|
||||||
|
|
@ -872,10 +872,10 @@ escape := ["\\bfnrts] | 'u{' hex-digit{1, 6} '}' | (unicode-space | newline)+
|
||||||
hex-digit := [0-9a-fA-F]
|
hex-digit := [0-9a-fA-F]
|
||||||
|
|
||||||
raw-string := '#' raw-string-quotes '#' | '#' raw-string '#'
|
raw-string := '#' raw-string-quotes '#' | '#' raw-string '#'
|
||||||
raw-string-quotes := '"' single-line-raw-string-body '"' | '"""' newline multi-line-raw-string-body newline unicode-space* '"""'
|
raw-string-quotes := '"' single-line-raw-string-body '"' | '"""' newline multi-line-raw-string-body '"""'
|
||||||
single-line-raw-string-body := '' | (single-line-raw-string-char - '"') single-line-raw-string-char* | '"' (single-line-raw-string-char - '"') single-line-raw-string-char*
|
single-line-raw-string-body := '' | (single-line-raw-string-char - '"') single-line-raw-string-char*? | '"' (single-line-raw-string-char - '"') single-line-raw-string-char*?
|
||||||
single-line-raw-string-char := unicode - newline - disallowed-literal-code-points
|
single-line-raw-string-char := unicode - newline - disallowed-literal-code-points
|
||||||
multi-line-raw-string-body := (unicode - disallowed-literal-code-points)*
|
multi-line-raw-string-body := (unicode - disallowed-literal-code-points)*?
|
||||||
|
|
||||||
// Numbers
|
// Numbers
|
||||||
number := keyword-number | hex | octal | binary | decimal
|
number := keyword-number | hex | octal | binary | decimal
|
||||||
|
|
@ -927,9 +927,20 @@ Specifically:
|
||||||
characters using hex values (`\u{FEFF}`), and for escaping `\` itself
|
characters using hex values (`\u{FEFF}`), and for escaping `\` itself
|
||||||
(`\\`).
|
(`\\`).
|
||||||
* `*` is used for "zero or more", `+` is used for "one or more", and `?` is
|
* `*` is used for "zero or more", `+` is used for "one or more", and `?` is
|
||||||
used for "zero or one".
|
used for "zero or one". Per standard regex semantics, `*` and `+` are *greedy*;
|
||||||
|
they match as many instances as possible without failing the match.
|
||||||
|
* `*?` (used only in raw strings) indicates a *non-greedy* match;
|
||||||
|
it matches as *few* instances as possible without failing the match.
|
||||||
|
* `¶` is a *cut point*. It always matches and consumes no characters,
|
||||||
|
but once matched, the parser is not allowed to backtrack past that point in the source.
|
||||||
|
If a parser would rewind past the cut point, it must instead fail the overall parse,
|
||||||
|
as if it had run out of options.
|
||||||
|
(This is only used with the `raw-string` production,
|
||||||
|
to ensure the first instance of the appropriate closing quote sequence
|
||||||
|
is guaranteed to be the end of the raw string,
|
||||||
|
rather than allowing it to potentially consume more of the document unexpectedly.)
|
||||||
* `()` can be used to group matches that must be matched together.
|
* `()` can be used to group matches that must be matched together.
|
||||||
* `a | b` means `a or b`, whichever matches first. If multipe items are before
|
* `a | b` means `a or b`, whichever matches first. If multiple items are before
|
||||||
a `|`, they are a single group. `a b c | d` is equivalent to `(a b c) | d`.
|
a `|`, they are a single group. `a b c | d` is equivalent to `(a b c) | d`.
|
||||||
* `[]` are used for regex-style character matches, where any character between
|
* `[]` are used for regex-style character matches, where any character between
|
||||||
the brackets will be a single match. `\` is used to escape `\`, `[`, and
|
the brackets will be a single match. `\` is used to escape `\`, `[`, and
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue