[v2] more predictable slashdash (#407)

Fixes: https://github.com/kdl-org/kdl/issues/401
2024-11-28 22:53:42 -08:00 · 2024-11-28 22:53:42 -08:00 · 90e22bc789
parent 1588b1f5fd
commit 90e22bc789
19 changed files with 110 additions and 45 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -36,7 +36,7 @@
 * Bare identifiers can now be used as values in Arguments and Properties, and are interpreted as string values.
 * The spec prose now more explicitly states that strings and raw strings can
  be used as type annotations.
-* A statement in the spec prose that said "It is reasonable for an
+* Removed a statement in the spec prose that said "It is reasonable for an
  implementation to ignore null values altogether when deserializing". This is
  no longer encouraged or desired.
 * Code points have been constrained to [Unicode Scalar
@ -69,6 +69,7 @@
 * Correspondingly, the identifiers `inf`, `-inf`, and `nan` are now syntax
  errors.
 * `u128` and `i128` have been added as well-known number type annotations.
 * Slashdash (`/-`) -compatible locations adjusted to be more clear and intuitive.
 ### KQL
--- a/SPEC.md
+++ b/SPEC.md
@ -272,8 +272,17 @@ node prop=(regex).*
 ### String
 Strings in KDL represent textual UTF-8 [Values](#value). A String is either an
-[Identifier String](#identifier-string) (like `foo`), a [Quoted String](#quoted-string) (like `"foo"`) or
+[Identifier String](#identifier-string) (like `foo`), a [Quoted
-a [Raw String](#raw-string) (like `#"foo"#`). Identifier Strings let you write short, "single-word" strings with a minimum of syntax; Quoted Strings let you write strings with whitespace (including newlines!) or escapes; Raw Strings let you write strings with whitespace *but without escapes*, allowing you to not worry about the string's content containing anything that might look like an escape.
+String](#quoted-string) (like `"foo"`) or a [Raw String](#raw-string) (like
 `#"foo"#`):
 * Identifier Strings let you write short, "single-word" strings with a
  minimum of syntax
 * Quoted Strings let you write strings with whitespace
  (including newlines!) or escapes
 * Raw Strings let you write strings with whitespace *but without escapes*,
  allowing you to not worry about the string's content containing anything that
  might look like an escape.
 Strings _MUST_ be represented as UTF-8 values.
@ -299,9 +308,9 @@ A handful of patterns are disallowed, to avoid confusion with other values:
 * idents that are the language keywords (`inf`, `-inf`, `nan`, `true`,
  `false`, and `null`) without their leading `#`.
-Identifiers that match these patterns _MUST_ be treated as a syntax error;
+Identifiers that match these patterns _MUST_ be treated as a syntax error; such
-such values can only be written as quoted or raw strings.
+values can only be written as quoted or raw strings. The precise details of the
-The precise details of the identifier syntax is specified in the [Full Grammar](#full-grammar) below.
+identifier syntax is specified in the [Full Grammar](#full-grammar) below.
 Identifier Strings are terminated by [Whitespace](#whitespace) or
 [Newlines](#newline).
@ -695,22 +704,26 @@ can be nested.
 Finally, a special kind of comment called a "slashdash", denoted by `/-`, can
 be used to comment out entire _components_ of a KDL document logically, and
-have those elements be treated as whitespace.
+have those elements not be included as part of the parsed document data.
-Slashdash comments can be used before:
+Slashdash comments can be used before the following, including before their type
 annotations, if present:
-* A [Node](#node) name (or its type annotation): the entire Node is
+* A [Node](#node): the entire Node is treated as Whitespace, including all
-  treated as Whitespace, including all props, args, and children.
+  props, args, and children.
-* A node [Argument](#argument) (or its type annotation), in which case
+* An [Argument](#argument): the Argument value is treated as Whitespace.
-  the Argument value is treated as Whitespace.
+* A [Property](#property) key: the entire property, including both key and value,
-* A [Property](#property) key, in which case the entire property, both
+  is treated as Whitespace. A slashdash of just the property value is not allowed.
-  key and value, is treated as Whitespace.
+* A [Children Block](#children-block): the entire block, including all
-* A [Children Block](#children-block), in which case the entire block,
+  children within, is treated as Whitespace. Only other children blocks, whether
-  including all children within, is treated as Whitespace.
+  slashdashed or not, may follow a slashdashed children block.
 A slashdash may be be followed by any amount of whitespace, including newlines and
 comments, before the element that it comments out.
 ### Newline
-The following characters [should be treated as new
+The following character sequences [should be treated as new
 lines](https://www.unicode.org/versions/Unicode13.0.0/ch05.pdf):
 | Acronym | Name            | Code Pt |
@ -750,35 +763,36 @@ language syntax](#grammar-language) is defined below.
 ```
 document := bom? nodes
 // Nodes
 nodes := (line-space* node)* line-space*
-plain-line-space := newline | ws | single-line-comment
+base-node := slashdash? type? node-space* string
-plain-node-space := ws* escline ws* | ws+
+      (node-space+ slashdash? node-prop-or-arg)*
      // slashdashed node-children must always be after props and args.
      (node-space+ slashdash node-children)*
      (node-space+ node-children)?
      (node-space+ slashdash node-children)*
 node := base-node node-space* node-terminator
 final-node := base-node node-space* node-terminator?
-line-space := plain-line-space+ | '/-' plain-node-space* node
+// Entries
 node-space := plain-node-space+ ('/-' plain-node-space* (node-prop-or-arg | node-children))?
 required-node-space := node-space* plain-node-space+
 optional-node-space := node-space*
 base-node := type? optional-node-space string (required-node-space node-prop-or-arg)* (required-node-space node-children)?
 node := base-node optional-node-space node-terminator
 final-node := base-node optional-node-space node-terminator?
 node-prop-or-arg := prop | value
 node-children := '{' nodes final-node? '}'
 node-terminator := single-line-comment | newline | ';' | eof
-prop := string optional-node-space '=' optional-node-space value
+prop := string node-space* '=' node-space* value
-value := type? optional-node-space (string | number | keyword)
+value := type? node-space* (string | number | keyword)
-type := '(' optional-node-space string optional-node-space ')'
+type := '(' node-space* string node-space* ')'
 // Strings
 string := identifier-string | quoted-string | raw-string
 identifier-string := unambiguous-ident | signed-ident | dotted-ident
-unambiguous-ident := ((identifier-char - digit - sign - '.') identifier-char*) - 'true' - 'false' - 'null' - 'inf' - '-inf' - 'nan'
+unambiguous-ident := ((identifier-char - digit - sign - '.') identifier-char*) - disallowed-keyword-strings
 signed-ident := sign ((identifier-char - digit - '.') identifier-char*)?
 dotted-ident := sign? '.' ((identifier-char - digit) identifier-char*)?
-identifier-char := unicode - unicode-space - newline - [\\/(){};\[\]"#=] - disallowed-literal-code-points
+identifier-char := unicode - unicode-space - newline - [\\/(){};\[\]"#=] - disallowed-literal-code-points - equals-sign
 disallowed-keyword-identifiers := 'true' - 'false' - 'null' - 'inf' - '-inf' - 'nan'
 quoted-string := '"' (single-line-string-body | newline multi-line-string-body newline unicode-space*) '"'
 single-line-string-body := (string-character - newline)*
@ -792,6 +806,7 @@ raw-string-quotes := '"' (single-line-raw-string-body | newline multi-line-raw-s
 single-line-raw-string-body := (unicode - newline - disallowed-literal-code-points)*
 multi-line-raw-string-body := (unicode - disallowed-literal-code-points)*
 // Numbers
 number := keyword-number | hex | octal | binary | decimal
 decimal := sign? integer ('.' integer)? exponent?
@ -804,29 +819,31 @@ hex := sign? '0x' hex-digit (hex-digit | '_')*
 octal := sign? '0o' [0-7] [0-7_]*
 binary := sign? '0b' ('0' | '1') ('0' | '1' | '_')*
 // Keywords and booleans.
 keyword := boolean | '#null'
 keyword-number := '#inf' | '#-inf' | '#nan'
 boolean := '#true' | '#false'
-escline := '\\' ws* (single-line-comment | newline | eof)
+// Specific code points
 newline := See Table (All line-break white_space)
 ws := unicode-space | multi-line-comment
 bom := '\u{FEFF}'
 disallowed-literal-code-points := See Table (Disallowed Literal Code Points)
 unicode := Any Unicode Scalar Value
 unicode-space := See Table (All White_Space unicode characters which are not `newline`)
-unicode-space := See Table (All [White_Space](#whitespace) unicode characters which are not `newline`)
+// Comments
 single-line-comment := '//' ^newline* (newline | eof)
 multi-line-comment := '/*' commented-block
 commented-block := '*/' | (multi-line-comment | '*' | '/' | [^*/]+) commented-block
 slashdash := '/-' line-space*
 // Whitespace
 ws := unicode-space | multi-line-comment
 escline := '\\' ws* (single-line-comment | newline | eof)
 newline := See Table (All Newline White_Space)
 // Whitespace where newlines are allowed.
 line-space := newline | ws | single-line-comment
 // Whitespace within nodes, where newline-ish things must be esclined.
 node-space := ws* escline ws* | ws+
 ```
 ### Grammar language
@ -850,3 +867,6 @@ Specifically:
  `a - 'x'` means "any `a`, except something that matches the literal `'x'`".
 * The prefix `^` means "something that does not match" whatever follows it.
  For example, `^foo` means "must not match `foo`".
 * A single definition may be split over multiple lines. Newlines are treated as
  spaces.
 * `//` at the beginning of a line is used for comments.
--- a/tests/test_cases/expected_kdl/slashdash_multi_line_comment_entry.kdl
+++ b/tests/test_cases/expected_kdl/slashdash_multi_line_comment_entry.kdl
@ -0,0 +1 @@
 node 1 3
--- a/tests/test_cases/expected_kdl/slashdash_multi_line_comment_inline.kdl
+++ b/tests/test_cases/expected_kdl/slashdash_multi_line_comment_inline.kdl
@ -0,0 +1 @@
 node 1 3
--- a/tests/test_cases/expected_kdl/slashdash_multiple_child_blocks.kdl
+++ b/tests/test_cases/expected_kdl/slashdash_multiple_child_blocks.kdl
@ -0,0 +1,3 @@
 node foo {
    three
 }
--- a/tests/test_cases/expected_kdl/slashdash_newline_before_children.kdl
+++ b/tests/test_cases/expected_kdl/slashdash_newline_before_children.kdl
@ -0,0 +1 @@
 node 1 2
--- a/tests/test_cases/expected_kdl/slashdash_newline_before_entry.kdl
+++ b/tests/test_cases/expected_kdl/slashdash_newline_before_entry.kdl
@ -0,0 +1 @@
 node 1 3
--- a/tests/test_cases/expected_kdl/slashdash_newline_before_node.kdl
+++ b/tests/test_cases/expected_kdl/slashdash_newline_before_node.kdl
--- a/tests/test_cases/expected_kdl/slashdash_single_line_comment_entry.kdl
+++ b/tests/test_cases/expected_kdl/slashdash_single_line_comment_entry.kdl
@ -0,0 +1 @@
 node 1 3
--- a/tests/test_cases/expected_kdl/slashdash_single_line_comment_node.kdl
+++ b/tests/test_cases/expected_kdl/slashdash_single_line_comment_node.kdl
@ -0,0 +1 @@
 node2
--- a/tests/test_cases/input/slashdash_child_block_before_entry_err.kdl
+++ b/tests/test_cases/input/slashdash_child_block_before_entry_err.kdl
@ -0,0 +1,5 @@
 node /-{
    child
 } foo {
    bar
 }
--- a/tests/test_cases/input/slashdash_multi_line_comment_entry.kdl
+++ b/tests/test_cases/input/slashdash_multi_line_comment_entry.kdl
@ -0,0 +1,6 @@
 node 1 /- /*
 multi
 line
 comment
 here
 */ 2 3
--- a/tests/test_cases/input/slashdash_multi_line_comment_inline.kdl
+++ b/tests/test_cases/input/slashdash_multi_line_comment_inline.kdl
@ -0,0 +1 @@
 node 1 /-/*two*/2 3
--- a/tests/test_cases/input/slashdash_multiple_child_blocks.kdl
+++ b/tests/test_cases/input/slashdash_multiple_child_blocks.kdl
@ -0,0 +1,10 @@
 node foo /-{
    one
 } \
 /-{
    two
 } {
    three
 } /-{
    four
 }
--- a/tests/test_cases/input/slashdash_newline_before_children.kdl
+++ b/tests/test_cases/input/slashdash_newline_before_children.kdl
@ -0,0 +1,4 @@
 node 1 2 /-
 {
    child
 }
--- a/tests/test_cases/input/slashdash_newline_before_entry.kdl
+++ b/tests/test_cases/input/slashdash_newline_before_entry.kdl
@ -0,0 +1,2 @@
 node 1 /-
 2 3
--- a/tests/test_cases/input/slashdash_newline_before_node.kdl
+++ b/tests/test_cases/input/slashdash_newline_before_node.kdl
@ -0,0 +1,2 @@
 /-
 node 1 2 3
--- a/tests/test_cases/input/slashdash_single_line_comment_entry.kdl
+++ b/tests/test_cases/input/slashdash_single_line_comment_entry.kdl
@ -0,0 +1,2 @@
 node 1 /- // stuff
 2 3
--- a/tests/test_cases/input/slashdash_single_line_comment_node.kdl
+++ b/tests/test_cases/input/slashdash_single_line_comment_node.kdl
@ -0,0 +1,3 @@
 /- // this is a comment
 node1
 node2