diff --git a/CHANGELOG.md b/CHANGELOG.md index abd18b9..4927376 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -36,7 +36,7 @@ * Bare identifiers can now be used as values in Arguments and Properties, and are interpreted as string values. * The spec prose now more explicitly states that strings and raw strings can be used as type annotations. -* A statement in the spec prose that said "It is reasonable for an +* Removed a statement in the spec prose that said "It is reasonable for an implementation to ignore null values altogether when deserializing". This is no longer encouraged or desired. * Code points have been constrained to [Unicode Scalar @@ -69,6 +69,7 @@ * Correspondingly, the identifiers `inf`, `-inf`, and `nan` are now syntax errors. * `u128` and `i128` have been added as well-known number type annotations. +* Slashdash (`/-`) -compatible locations adjusted to be more clear and intuitive. ### KQL diff --git a/SPEC.md b/SPEC.md index c812c4a..c3f749a 100644 --- a/SPEC.md +++ b/SPEC.md @@ -272,8 +272,17 @@ node prop=(regex).* ### String Strings in KDL represent textual UTF-8 [Values](#value). A String is either an -[Identifier String](#identifier-string) (like `foo`), a [Quoted String](#quoted-string) (like `"foo"`) or -a [Raw String](#raw-string) (like `#"foo"#`). Identifier Strings let you write short, "single-word" strings with a minimum of syntax; Quoted Strings let you write strings with whitespace (including newlines!) or escapes; Raw Strings let you write strings with whitespace *but without escapes*, allowing you to not worry about the string's content containing anything that might look like an escape. +[Identifier String](#identifier-string) (like `foo`), a [Quoted +String](#quoted-string) (like `"foo"`) or a [Raw String](#raw-string) (like +`#"foo"#`): + +* Identifier Strings let you write short, "single-word" strings with a + minimum of syntax +* Quoted Strings let you write strings with whitespace + (including newlines!) or escapes +* Raw Strings let you write strings with whitespace *but without escapes*, + allowing you to not worry about the string's content containing anything that + might look like an escape. Strings _MUST_ be represented as UTF-8 values. @@ -299,9 +308,9 @@ A handful of patterns are disallowed, to avoid confusion with other values: * idents that are the language keywords (`inf`, `-inf`, `nan`, `true`, `false`, and `null`) without their leading `#`. -Identifiers that match these patterns _MUST_ be treated as a syntax error; -such values can only be written as quoted or raw strings. -The precise details of the identifier syntax is specified in the [Full Grammar](#full-grammar) below. +Identifiers that match these patterns _MUST_ be treated as a syntax error; such +values can only be written as quoted or raw strings. The precise details of the +identifier syntax is specified in the [Full Grammar](#full-grammar) below. Identifier Strings are terminated by [Whitespace](#whitespace) or [Newlines](#newline). @@ -695,22 +704,26 @@ can be nested. Finally, a special kind of comment called a "slashdash", denoted by `/-`, can be used to comment out entire _components_ of a KDL document logically, and -have those elements be treated as whitespace. +have those elements not be included as part of the parsed document data. -Slashdash comments can be used before: +Slashdash comments can be used before the following, including before their type +annotations, if present: -* A [Node](#node) name (or its type annotation): the entire Node is - treated as Whitespace, including all props, args, and children. -* A node [Argument](#argument) (or its type annotation), in which case - the Argument value is treated as Whitespace. -* A [Property](#property) key, in which case the entire property, both - key and value, is treated as Whitespace. -* A [Children Block](#children-block), in which case the entire block, - including all children within, is treated as Whitespace. +* A [Node](#node): the entire Node is treated as Whitespace, including all + props, args, and children. +* An [Argument](#argument): the Argument value is treated as Whitespace. +* A [Property](#property) key: the entire property, including both key and value, + is treated as Whitespace. A slashdash of just the property value is not allowed. +* A [Children Block](#children-block): the entire block, including all + children within, is treated as Whitespace. Only other children blocks, whether + slashdashed or not, may follow a slashdashed children block. + +A slashdash may be be followed by any amount of whitespace, including newlines and +comments, before the element that it comments out. ### Newline -The following characters [should be treated as new +The following character sequences [should be treated as new lines](https://www.unicode.org/versions/Unicode13.0.0/ch05.pdf): | Acronym | Name | Code Pt | @@ -750,35 +763,36 @@ language syntax](#grammar-language) is defined below. ``` document := bom? nodes +// Nodes nodes := (line-space* node)* line-space* -plain-line-space := newline | ws | single-line-comment -plain-node-space := ws* escline ws* | ws+ +base-node := slashdash? type? node-space* string + (node-space+ slashdash? node-prop-or-arg)* + // slashdashed node-children must always be after props and args. + (node-space+ slashdash node-children)* + (node-space+ node-children)? + (node-space+ slashdash node-children)* +node := base-node node-space* node-terminator +final-node := base-node node-space* node-terminator? -line-space := plain-line-space+ | '/-' plain-node-space* node -node-space := plain-node-space+ ('/-' plain-node-space* (node-prop-or-arg | node-children))? - -required-node-space := node-space* plain-node-space+ -optional-node-space := node-space* - -base-node := type? optional-node-space string (required-node-space node-prop-or-arg)* (required-node-space node-children)? -node := base-node optional-node-space node-terminator -final-node := base-node optional-node-space node-terminator? +// Entries node-prop-or-arg := prop | value node-children := '{' nodes final-node? '}' node-terminator := single-line-comment | newline | ';' | eof -prop := string optional-node-space '=' optional-node-space value -value := type? optional-node-space (string | number | keyword) -type := '(' optional-node-space string optional-node-space ')' +prop := string node-space* '=' node-space* value +value := type? node-space* (string | number | keyword) +type := '(' node-space* string node-space* ')' +// Strings string := identifier-string | quoted-string | raw-string identifier-string := unambiguous-ident | signed-ident | dotted-ident -unambiguous-ident := ((identifier-char - digit - sign - '.') identifier-char*) - 'true' - 'false' - 'null' - 'inf' - '-inf' - 'nan' +unambiguous-ident := ((identifier-char - digit - sign - '.') identifier-char*) - disallowed-keyword-strings signed-ident := sign ((identifier-char - digit - '.') identifier-char*)? dotted-ident := sign? '.' ((identifier-char - digit) identifier-char*)? -identifier-char := unicode - unicode-space - newline - [\\/(){};\[\]"#=] - disallowed-literal-code-points +identifier-char := unicode - unicode-space - newline - [\\/(){};\[\]"#=] - disallowed-literal-code-points - equals-sign +disallowed-keyword-identifiers := 'true' - 'false' - 'null' - 'inf' - '-inf' - 'nan' quoted-string := '"' (single-line-string-body | newline multi-line-string-body newline unicode-space*) '"' single-line-string-body := (string-character - newline)* @@ -792,6 +806,7 @@ raw-string-quotes := '"' (single-line-raw-string-body | newline multi-line-raw-s single-line-raw-string-body := (unicode - newline - disallowed-literal-code-points)* multi-line-raw-string-body := (unicode - disallowed-literal-code-points)* +// Numbers number := keyword-number | hex | octal | binary | decimal decimal := sign? integer ('.' integer)? exponent? @@ -804,29 +819,31 @@ hex := sign? '0x' hex-digit (hex-digit | '_')* octal := sign? '0o' [0-7] [0-7_]* binary := sign? '0b' ('0' | '1') ('0' | '1' | '_')* +// Keywords and booleans. keyword := boolean | '#null' - keyword-number := '#inf' | '#-inf' | '#nan' - boolean := '#true' | '#false' -escline := '\\' ws* (single-line-comment | newline | eof) - -newline := See Table (All line-break white_space) - -ws := unicode-space | multi-line-comment - +// Specific code points bom := '\u{FEFF}' - disallowed-literal-code-points := See Table (Disallowed Literal Code Points) - unicode := Any Unicode Scalar Value +unicode-space := See Table (All White_Space unicode characters which are not `newline`) -unicode-space := See Table (All [White_Space](#whitespace) unicode characters which are not `newline`) - +// Comments single-line-comment := '//' ^newline* (newline | eof) multi-line-comment := '/*' commented-block commented-block := '*/' | (multi-line-comment | '*' | '/' | [^*/]+) commented-block +slashdash := '/-' line-space* + +// Whitespace +ws := unicode-space | multi-line-comment +escline := '\\' ws* (single-line-comment | newline | eof) +newline := See Table (All Newline White_Space) +// Whitespace where newlines are allowed. +line-space := newline | ws | single-line-comment +// Whitespace within nodes, where newline-ish things must be esclined. +node-space := ws* escline ws* | ws+ ``` ### Grammar language @@ -850,3 +867,6 @@ Specifically: `a - 'x'` means "any `a`, except something that matches the literal `'x'`". * The prefix `^` means "something that does not match" whatever follows it. For example, `^foo` means "must not match `foo`". +* A single definition may be split over multiple lines. Newlines are treated as + spaces. +* `//` at the beginning of a line is used for comments. \ No newline at end of file diff --git a/tests/test_cases/expected_kdl/slashdash_multi_line_comment_entry.kdl b/tests/test_cases/expected_kdl/slashdash_multi_line_comment_entry.kdl new file mode 100644 index 0000000..0c7db5c --- /dev/null +++ b/tests/test_cases/expected_kdl/slashdash_multi_line_comment_entry.kdl @@ -0,0 +1 @@ +node 1 3 diff --git a/tests/test_cases/expected_kdl/slashdash_multi_line_comment_inline.kdl b/tests/test_cases/expected_kdl/slashdash_multi_line_comment_inline.kdl new file mode 100644 index 0000000..0c7db5c --- /dev/null +++ b/tests/test_cases/expected_kdl/slashdash_multi_line_comment_inline.kdl @@ -0,0 +1 @@ +node 1 3 diff --git a/tests/test_cases/expected_kdl/slashdash_multiple_child_blocks.kdl b/tests/test_cases/expected_kdl/slashdash_multiple_child_blocks.kdl new file mode 100644 index 0000000..6ff16cc --- /dev/null +++ b/tests/test_cases/expected_kdl/slashdash_multiple_child_blocks.kdl @@ -0,0 +1,3 @@ +node foo { + three +} diff --git a/tests/test_cases/expected_kdl/slashdash_newline_before_children.kdl b/tests/test_cases/expected_kdl/slashdash_newline_before_children.kdl new file mode 100644 index 0000000..3b77f56 --- /dev/null +++ b/tests/test_cases/expected_kdl/slashdash_newline_before_children.kdl @@ -0,0 +1 @@ +node 1 2 diff --git a/tests/test_cases/expected_kdl/slashdash_newline_before_entry.kdl b/tests/test_cases/expected_kdl/slashdash_newline_before_entry.kdl new file mode 100644 index 0000000..0c7db5c --- /dev/null +++ b/tests/test_cases/expected_kdl/slashdash_newline_before_entry.kdl @@ -0,0 +1 @@ +node 1 3 diff --git a/tests/test_cases/expected_kdl/slashdash_newline_before_node.kdl b/tests/test_cases/expected_kdl/slashdash_newline_before_node.kdl new file mode 100644 index 0000000..e69de29 diff --git a/tests/test_cases/expected_kdl/slashdash_single_line_comment_entry.kdl b/tests/test_cases/expected_kdl/slashdash_single_line_comment_entry.kdl new file mode 100644 index 0000000..0c7db5c --- /dev/null +++ b/tests/test_cases/expected_kdl/slashdash_single_line_comment_entry.kdl @@ -0,0 +1 @@ +node 1 3 diff --git a/tests/test_cases/expected_kdl/slashdash_single_line_comment_node.kdl b/tests/test_cases/expected_kdl/slashdash_single_line_comment_node.kdl new file mode 100644 index 0000000..6810417 --- /dev/null +++ b/tests/test_cases/expected_kdl/slashdash_single_line_comment_node.kdl @@ -0,0 +1 @@ +node2 diff --git a/tests/test_cases/input/slashdash_child_block_before_entry_err.kdl b/tests/test_cases/input/slashdash_child_block_before_entry_err.kdl new file mode 100644 index 0000000..b9edfc3 --- /dev/null +++ b/tests/test_cases/input/slashdash_child_block_before_entry_err.kdl @@ -0,0 +1,5 @@ +node /-{ + child +} foo { + bar +} diff --git a/tests/test_cases/input/slashdash_multi_line_comment_entry.kdl b/tests/test_cases/input/slashdash_multi_line_comment_entry.kdl new file mode 100644 index 0000000..97a41e7 --- /dev/null +++ b/tests/test_cases/input/slashdash_multi_line_comment_entry.kdl @@ -0,0 +1,6 @@ +node 1 /- /* +multi +line +comment +here +*/ 2 3 diff --git a/tests/test_cases/input/slashdash_multi_line_comment_inline.kdl b/tests/test_cases/input/slashdash_multi_line_comment_inline.kdl new file mode 100644 index 0000000..1fd93ce --- /dev/null +++ b/tests/test_cases/input/slashdash_multi_line_comment_inline.kdl @@ -0,0 +1 @@ +node 1 /-/*two*/2 3 diff --git a/tests/test_cases/input/slashdash_multiple_child_blocks.kdl b/tests/test_cases/input/slashdash_multiple_child_blocks.kdl new file mode 100644 index 0000000..2f85ce1 --- /dev/null +++ b/tests/test_cases/input/slashdash_multiple_child_blocks.kdl @@ -0,0 +1,10 @@ +node foo /-{ + one +} \ +/-{ + two +} { + three +} /-{ + four +} diff --git a/tests/test_cases/input/slashdash_newline_before_children.kdl b/tests/test_cases/input/slashdash_newline_before_children.kdl new file mode 100644 index 0000000..deefb7f --- /dev/null +++ b/tests/test_cases/input/slashdash_newline_before_children.kdl @@ -0,0 +1,4 @@ +node 1 2 /- +{ + child +} diff --git a/tests/test_cases/input/slashdash_newline_before_entry.kdl b/tests/test_cases/input/slashdash_newline_before_entry.kdl new file mode 100644 index 0000000..f6de9f9 --- /dev/null +++ b/tests/test_cases/input/slashdash_newline_before_entry.kdl @@ -0,0 +1,2 @@ +node 1 /- +2 3 diff --git a/tests/test_cases/input/slashdash_newline_before_node.kdl b/tests/test_cases/input/slashdash_newline_before_node.kdl new file mode 100644 index 0000000..545464f --- /dev/null +++ b/tests/test_cases/input/slashdash_newline_before_node.kdl @@ -0,0 +1,2 @@ +/- +node 1 2 3 diff --git a/tests/test_cases/input/slashdash_single_line_comment_entry.kdl b/tests/test_cases/input/slashdash_single_line_comment_entry.kdl new file mode 100644 index 0000000..2f807fc --- /dev/null +++ b/tests/test_cases/input/slashdash_single_line_comment_entry.kdl @@ -0,0 +1,2 @@ +node 1 /- // stuff +2 3 diff --git a/tests/test_cases/input/slashdash_single_line_comment_node.kdl b/tests/test_cases/input/slashdash_single_line_comment_node.kdl new file mode 100644 index 0000000..a378a18 --- /dev/null +++ b/tests/test_cases/input/slashdash_single_line_comment_node.kdl @@ -0,0 +1,3 @@ +/- // this is a comment +node1 +node2