[v2] more predictable slashdash (#407)

Fixes: https://github.com/kdl-org/kdl/issues/401
This commit is contained in:
Kat Marchán 2024-11-28 22:53:42 -08:00 committed by GitHub
parent 1588b1f5fd
commit 90e22bc789
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
19 changed files with 110 additions and 45 deletions

View File

@ -36,7 +36,7 @@
* Bare identifiers can now be used as values in Arguments and Properties, and are interpreted as string values. * Bare identifiers can now be used as values in Arguments and Properties, and are interpreted as string values.
* The spec prose now more explicitly states that strings and raw strings can * The spec prose now more explicitly states that strings and raw strings can
be used as type annotations. be used as type annotations.
* A statement in the spec prose that said "It is reasonable for an * Removed a statement in the spec prose that said "It is reasonable for an
implementation to ignore null values altogether when deserializing". This is implementation to ignore null values altogether when deserializing". This is
no longer encouraged or desired. no longer encouraged or desired.
* Code points have been constrained to [Unicode Scalar * Code points have been constrained to [Unicode Scalar
@ -69,6 +69,7 @@
* Correspondingly, the identifiers `inf`, `-inf`, and `nan` are now syntax * Correspondingly, the identifiers `inf`, `-inf`, and `nan` are now syntax
errors. errors.
* `u128` and `i128` have been added as well-known number type annotations. * `u128` and `i128` have been added as well-known number type annotations.
* Slashdash (`/-`) -compatible locations adjusted to be more clear and intuitive.
### KQL ### KQL

108
SPEC.md
View File

@ -272,8 +272,17 @@ node prop=(regex).*
### String ### String
Strings in KDL represent textual UTF-8 [Values](#value). A String is either an Strings in KDL represent textual UTF-8 [Values](#value). A String is either an
[Identifier String](#identifier-string) (like `foo`), a [Quoted String](#quoted-string) (like `"foo"`) or [Identifier String](#identifier-string) (like `foo`), a [Quoted
a [Raw String](#raw-string) (like `#"foo"#`). Identifier Strings let you write short, "single-word" strings with a minimum of syntax; Quoted Strings let you write strings with whitespace (including newlines!) or escapes; Raw Strings let you write strings with whitespace *but without escapes*, allowing you to not worry about the string's content containing anything that might look like an escape. String](#quoted-string) (like `"foo"`) or a [Raw String](#raw-string) (like
`#"foo"#`):
* Identifier Strings let you write short, "single-word" strings with a
minimum of syntax
* Quoted Strings let you write strings with whitespace
(including newlines!) or escapes
* Raw Strings let you write strings with whitespace *but without escapes*,
allowing you to not worry about the string's content containing anything that
might look like an escape.
Strings _MUST_ be represented as UTF-8 values. Strings _MUST_ be represented as UTF-8 values.
@ -299,9 +308,9 @@ A handful of patterns are disallowed, to avoid confusion with other values:
* idents that are the language keywords (`inf`, `-inf`, `nan`, `true`, * idents that are the language keywords (`inf`, `-inf`, `nan`, `true`,
`false`, and `null`) without their leading `#`. `false`, and `null`) without their leading `#`.
Identifiers that match these patterns _MUST_ be treated as a syntax error; Identifiers that match these patterns _MUST_ be treated as a syntax error; such
such values can only be written as quoted or raw strings. values can only be written as quoted or raw strings. The precise details of the
The precise details of the identifier syntax is specified in the [Full Grammar](#full-grammar) below. identifier syntax is specified in the [Full Grammar](#full-grammar) below.
Identifier Strings are terminated by [Whitespace](#whitespace) or Identifier Strings are terminated by [Whitespace](#whitespace) or
[Newlines](#newline). [Newlines](#newline).
@ -695,22 +704,26 @@ can be nested.
Finally, a special kind of comment called a "slashdash", denoted by `/-`, can Finally, a special kind of comment called a "slashdash", denoted by `/-`, can
be used to comment out entire _components_ of a KDL document logically, and be used to comment out entire _components_ of a KDL document logically, and
have those elements be treated as whitespace. have those elements not be included as part of the parsed document data.
Slashdash comments can be used before: Slashdash comments can be used before the following, including before their type
annotations, if present:
* A [Node](#node) name (or its type annotation): the entire Node is * A [Node](#node): the entire Node is treated as Whitespace, including all
treated as Whitespace, including all props, args, and children. props, args, and children.
* A node [Argument](#argument) (or its type annotation), in which case * An [Argument](#argument): the Argument value is treated as Whitespace.
the Argument value is treated as Whitespace. * A [Property](#property) key: the entire property, including both key and value,
* A [Property](#property) key, in which case the entire property, both is treated as Whitespace. A slashdash of just the property value is not allowed.
key and value, is treated as Whitespace. * A [Children Block](#children-block): the entire block, including all
* A [Children Block](#children-block), in which case the entire block, children within, is treated as Whitespace. Only other children blocks, whether
including all children within, is treated as Whitespace. slashdashed or not, may follow a slashdashed children block.
A slashdash may be be followed by any amount of whitespace, including newlines and
comments, before the element that it comments out.
### Newline ### Newline
The following characters [should be treated as new The following character sequences [should be treated as new
lines](https://www.unicode.org/versions/Unicode13.0.0/ch05.pdf): lines](https://www.unicode.org/versions/Unicode13.0.0/ch05.pdf):
| Acronym | Name | Code Pt | | Acronym | Name | Code Pt |
@ -750,35 +763,36 @@ language syntax](#grammar-language) is defined below.
``` ```
document := bom? nodes document := bom? nodes
// Nodes
nodes := (line-space* node)* line-space* nodes := (line-space* node)* line-space*
plain-line-space := newline | ws | single-line-comment base-node := slashdash? type? node-space* string
plain-node-space := ws* escline ws* | ws+ (node-space+ slashdash? node-prop-or-arg)*
// slashdashed node-children must always be after props and args.
(node-space+ slashdash node-children)*
(node-space+ node-children)?
(node-space+ slashdash node-children)*
node := base-node node-space* node-terminator
final-node := base-node node-space* node-terminator?
line-space := plain-line-space+ | '/-' plain-node-space* node // Entries
node-space := plain-node-space+ ('/-' plain-node-space* (node-prop-or-arg | node-children))?
required-node-space := node-space* plain-node-space+
optional-node-space := node-space*
base-node := type? optional-node-space string (required-node-space node-prop-or-arg)* (required-node-space node-children)?
node := base-node optional-node-space node-terminator
final-node := base-node optional-node-space node-terminator?
node-prop-or-arg := prop | value node-prop-or-arg := prop | value
node-children := '{' nodes final-node? '}' node-children := '{' nodes final-node? '}'
node-terminator := single-line-comment | newline | ';' | eof node-terminator := single-line-comment | newline | ';' | eof
prop := string optional-node-space '=' optional-node-space value prop := string node-space* '=' node-space* value
value := type? optional-node-space (string | number | keyword) value := type? node-space* (string | number | keyword)
type := '(' optional-node-space string optional-node-space ')' type := '(' node-space* string node-space* ')'
// Strings
string := identifier-string | quoted-string | raw-string string := identifier-string | quoted-string | raw-string
identifier-string := unambiguous-ident | signed-ident | dotted-ident identifier-string := unambiguous-ident | signed-ident | dotted-ident
unambiguous-ident := ((identifier-char - digit - sign - '.') identifier-char*) - 'true' - 'false' - 'null' - 'inf' - '-inf' - 'nan' unambiguous-ident := ((identifier-char - digit - sign - '.') identifier-char*) - disallowed-keyword-strings
signed-ident := sign ((identifier-char - digit - '.') identifier-char*)? signed-ident := sign ((identifier-char - digit - '.') identifier-char*)?
dotted-ident := sign? '.' ((identifier-char - digit) identifier-char*)? dotted-ident := sign? '.' ((identifier-char - digit) identifier-char*)?
identifier-char := unicode - unicode-space - newline - [\\/(){};\[\]"#=] - disallowed-literal-code-points identifier-char := unicode - unicode-space - newline - [\\/(){};\[\]"#=] - disallowed-literal-code-points - equals-sign
disallowed-keyword-identifiers := 'true' - 'false' - 'null' - 'inf' - '-inf' - 'nan'
quoted-string := '"' (single-line-string-body | newline multi-line-string-body newline unicode-space*) '"' quoted-string := '"' (single-line-string-body | newline multi-line-string-body newline unicode-space*) '"'
single-line-string-body := (string-character - newline)* single-line-string-body := (string-character - newline)*
@ -792,6 +806,7 @@ raw-string-quotes := '"' (single-line-raw-string-body | newline multi-line-raw-s
single-line-raw-string-body := (unicode - newline - disallowed-literal-code-points)* single-line-raw-string-body := (unicode - newline - disallowed-literal-code-points)*
multi-line-raw-string-body := (unicode - disallowed-literal-code-points)* multi-line-raw-string-body := (unicode - disallowed-literal-code-points)*
// Numbers
number := keyword-number | hex | octal | binary | decimal number := keyword-number | hex | octal | binary | decimal
decimal := sign? integer ('.' integer)? exponent? decimal := sign? integer ('.' integer)? exponent?
@ -804,29 +819,31 @@ hex := sign? '0x' hex-digit (hex-digit | '_')*
octal := sign? '0o' [0-7] [0-7_]* octal := sign? '0o' [0-7] [0-7_]*
binary := sign? '0b' ('0' | '1') ('0' | '1' | '_')* binary := sign? '0b' ('0' | '1') ('0' | '1' | '_')*
// Keywords and booleans.
keyword := boolean | '#null' keyword := boolean | '#null'
keyword-number := '#inf' | '#-inf' | '#nan' keyword-number := '#inf' | '#-inf' | '#nan'
boolean := '#true' | '#false' boolean := '#true' | '#false'
escline := '\\' ws* (single-line-comment | newline | eof) // Specific code points
newline := See Table (All line-break white_space)
ws := unicode-space | multi-line-comment
bom := '\u{FEFF}' bom := '\u{FEFF}'
disallowed-literal-code-points := See Table (Disallowed Literal Code Points) disallowed-literal-code-points := See Table (Disallowed Literal Code Points)
unicode := Any Unicode Scalar Value unicode := Any Unicode Scalar Value
unicode-space := See Table (All White_Space unicode characters which are not `newline`)
unicode-space := See Table (All [White_Space](#whitespace) unicode characters which are not `newline`) // Comments
single-line-comment := '//' ^newline* (newline | eof) single-line-comment := '//' ^newline* (newline | eof)
multi-line-comment := '/*' commented-block multi-line-comment := '/*' commented-block
commented-block := '*/' | (multi-line-comment | '*' | '/' | [^*/]+) commented-block commented-block := '*/' | (multi-line-comment | '*' | '/' | [^*/]+) commented-block
slashdash := '/-' line-space*
// Whitespace
ws := unicode-space | multi-line-comment
escline := '\\' ws* (single-line-comment | newline | eof)
newline := See Table (All Newline White_Space)
// Whitespace where newlines are allowed.
line-space := newline | ws | single-line-comment
// Whitespace within nodes, where newline-ish things must be esclined.
node-space := ws* escline ws* | ws+
``` ```
### Grammar language ### Grammar language
@ -850,3 +867,6 @@ Specifically:
`a - 'x'` means "any `a`, except something that matches the literal `'x'`". `a - 'x'` means "any `a`, except something that matches the literal `'x'`".
* The prefix `^` means "something that does not match" whatever follows it. * The prefix `^` means "something that does not match" whatever follows it.
For example, `^foo` means "must not match `foo`". For example, `^foo` means "must not match `foo`".
* A single definition may be split over multiple lines. Newlines are treated as
spaces.
* `//` at the beginning of a line is used for comments.

View File

@ -0,0 +1 @@
node 1 3

View File

@ -0,0 +1 @@
node 1 3

View File

@ -0,0 +1,3 @@
node foo {
three
}

View File

@ -0,0 +1 @@
node 1 2

View File

@ -0,0 +1 @@
node 1 3

View File

@ -0,0 +1 @@
node 1 3

View File

@ -0,0 +1 @@
node2

View File

@ -0,0 +1,5 @@
node /-{
child
} foo {
bar
}

View File

@ -0,0 +1,6 @@
node 1 /- /*
multi
line
comment
here
*/ 2 3

View File

@ -0,0 +1 @@
node 1 /-/*two*/2 3

View File

@ -0,0 +1,10 @@
node foo /-{
one
} \
/-{
two
} {
three
} /-{
four
}

View File

@ -0,0 +1,4 @@
node 1 2 /-
{
child
}

View File

@ -0,0 +1,2 @@
node 1 /-
2 3

View File

@ -0,0 +1,2 @@
/-
node 1 2 3

View File

@ -0,0 +1,2 @@
node 1 /- // stuff
2 3

View File

@ -0,0 +1,3 @@
/- // this is a comment
node1
node2