From 910f6e90a7559113f1a96abb497965700c7fb8f1 Mon Sep 17 00:00:00 2001 From: Danielle Smith Date: Sun, 28 Aug 2022 21:59:26 +0200 Subject: [PATCH 01/15] Do not escape / (Solidus, Forwardslash) (#197) --- SPEC.md | 3 +-- tests/test_cases/input/all_escapes.kdl | 2 +- 2 files changed, 2 insertions(+), 3 deletions(-) diff --git a/SPEC.md b/SPEC.md index e2fd106..f625ba1 100644 --- a/SPEC.md +++ b/SPEC.md @@ -319,7 +319,6 @@ interpreted as described in the following table: | Carriage Return | `\r` | `U+000D` | | Character Tabulation (Tab) | `\t` | `U+0009` | | Reverse Solidus (Backslash) | `\\` | `U+005C` | -| Solidus (Forwardslash) | `\/` | `U+002F` | | Quotation Mark (Double Quote) | `\"` | `U+0022` | | Backspace | `\b` | `U+0008` | | Form Feed | `\f` | `U+000C` | @@ -461,7 +460,7 @@ type := '(' identifier ')' string := raw-string | escaped-string escaped-string := '"' character* '"' character := '\' escape | [^\"] -escape := ["\\/bfnrt] | 'u{' hex-digit{1, 6} '}' +escape := ["\\bfnrt] | 'u{' hex-digit{1, 6} '}' hex-digit := [0-9a-fA-F] raw-string := 'r' raw-string-hash diff --git a/tests/test_cases/input/all_escapes.kdl b/tests/test_cases/input/all_escapes.kdl index 5bb1dc3..024cda2 100644 --- a/tests/test_cases/input/all_escapes.kdl +++ b/tests/test_cases/input/all_escapes.kdl @@ -1 +1 @@ -node "\"\\\/\b\f\n\r\t" +node "\"\\\b\f\n\r\t" From 69ac280bf058ad0003a807577585570e8e646723 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Sun, 28 Aug 2022 13:01:07 -0700 Subject: [PATCH 02/15] KQL: require operator and change operator grammar a bit (#221) --- QUERY-SPEC.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/QUERY-SPEC.md b/QUERY-SPEC.md index 766794f..829f978 100644 --- a/QUERY-SPEC.md +++ b/QUERY-SPEC.md @@ -5,20 +5,20 @@ documents to extract nodes and even specific data. It is loosely based on CSS selectors for familiarity and ease of use. Think of it as CSS Selectors or XPath, but for KDL! -This document describes KQL `1.0.0`. It was released on September 11, 2021. +This document describes KQL `next`. It is unreleased. ## Selectors Selectors use selection operators to filter nodes that will be returned by an API using KQL. The main differences between this and CSS selectors are the -lack of `*` (use `[]` instead), and the specific syntax for +lack of `*` (use `[]` instead), the specific syntax for descendants and siblings, and the specific syntax for [matchers](#matchers) (the stuff between `[` and `]`), which is similar, but not identical to CSS. * `a > b`: Selects any `b` element that is a direct child of an `a` element. -* `a b`: Selects any `b` element that is a _descendant_ of an `a` element. -* `a b || a c`: Selects all `b` and `c` elements that are descendants of an `a` element. Any selector may be on either side of the `||`. Multiple `||` are supported. +* `a >> b`: Selects any `b` element that is a _descendant_ of an `a` element. +* `a >> b || a >> c`: Selects all `b` and `c` elements that are descendants of an `a` element. Any selector may be on either side of the `||`. Multiple `||` are supported. * `a + b`: Selects any `b` element that is placed immediately after a sibling `a` element. -* `a ~ b`: Selects any `b` element that follows an `a` element as a sibling, either immediately or later. +* `a ++ b`: Selects any `b` element that follows an `a` element as a sibling, either immediately or later. * `[accessor()]`: Selects any element, filtered by [an accessor](#accessors). (`accessor()` is a placeholder, not an actual accessor) * `a[accessor()]`: Selects any `a` element, filtered by an accessor. * `[]`: Selects any element. @@ -108,16 +108,16 @@ package { winapi "1.0.0" path="./crates/my-winapi-fork" } dependencies { - miette "2.0.0" dev=true + miette "2.0.0" dev=true integrity=(sri)"sha512-deadbeef" } } ``` Then the following queries are valid: -* `package name` +* `package >> name` * -> fetches the `name` node itself -* `top() > package name` +* `top() > package >> name` * -> fetches the `name` node, guaranteeing that `package` is in the document root. * `dependencies` * -> deep-fetches both `dependencies` nodes From 2d5e543bbe5ad68e54ea0394368c30d4be4313a6 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Sun, 28 Aug 2022 13:01:53 -0700 Subject: [PATCH 03/15] KQL: remove map operator and accessors (#222) Honestly, they're just too implementation-specific --- QUERY-SPEC.md | 39 --------------------------------------- 1 file changed, 39 deletions(-) diff --git a/QUERY-SPEC.md b/QUERY-SPEC.md index 829f978..7ab1b46 100644 --- a/QUERY-SPEC.md +++ b/QUERY-SPEC.md @@ -69,33 +69,6 @@ is not one of those, the matcher will always fail: * `[val() = (foo)]`: Selects any element whose tag is "foo". -## Map Operator - -KQL implementations MAY support a "map operator", `=>`, that allows selection -of specific parts of the selected notes, essentially "mapping" over a -selector's result set. - -Only a single map operator may be used, and it must be the last element in a -selector string. - -The map operator's right hand side is either an [`accessor`](#accessors) on -its own, or a tuple of accessors, denoted by a comma-separated list wrapped in -`()` (for example, `(a, b, c)`). - -## Accessors - -Accessors access/extract specific parts of a node. They are used with the [map -operator](#map-operator), and have syntactic overlap with some -[matchers](#matchers). - -* `name()`: Returns the name of the node itself. -* `val(2)`: Returns the third value in a node. -* `val()`: Equivalent to `val(0)`. -* `prop(foo)`: Returns the value of the property `foo` in the node. -* `foo`: Equivalent to `prop(foo)`. -* `props()`: Returns all properties of the node as an object. -* `values()`: Returns all values of the node as an array. - ## Examples Given this document: @@ -128,15 +101,3 @@ Then the following queries are valid: * `dependencies > []` * -> fetches all direct-child nodes of any `dependencies` nodes in the document. In this case, it will fetch both `miette` and `winapi` nodes. - -If using an API that supports the [map operator](#map-operator), the following -are valid queries: - -* `package name => val()` - * -> `["foo"]`. -* `dependencies[platform] => platform` - * -> `["windows"]` -* `dependencies > [] => (name(), val(), path)` - * -> `[("winapi", "1.0.0", "./crates/my-winapi-fork"), ("miette", "2.0.0", None)]` -* `dependencies > [] => (name(), values(), props())` - * -> `[("winapi", ["1.0.0"], {"platform": "windows"}), ("miette", ["2.0.0"], {"dev": true})]` From 1bf4d740faad46299f55bd4711f96b850663156e Mon Sep 17 00:00:00 2001 From: Basile Henry Date: Sun, 28 Aug 2022 22:07:17 +0200 Subject: [PATCH 04/15] Allow "empty" single line comments in the spec (#234) As I read the grammar in the spec, `"//"` wouldn't parse as a single-line-comment as it requires as least one non-newline character after the slashes. --- SPEC.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/SPEC.md b/SPEC.md index f625ba1..01d7570 100644 --- a/SPEC.md +++ b/SPEC.md @@ -493,7 +493,7 @@ bom := '\u{FEFF}' unicode-space := See Table (All White_Space unicode characters which are not `newline`) -single-line-comment := '//' ^newline+ (newline | eof) +single-line-comment := '//' ^newline* (newline | eof) multi-line-comment := '/*' commented-block commented-block := '*/' | (multi-line-comment | '*' | '/' | [^*/]+) commented-block ``` From 78a2d5f5ed821f82acc097d1f7ccba914d08dbef Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Sun, 28 Aug 2022 13:14:09 -0700 Subject: [PATCH 05/15] Draft changelog --- CHANGELOG.md | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) create mode 100644 CHANGELOG.md diff --git a/CHANGELOG.md b/CHANGELOG.md new file mode 100644 index 0000000..6f9ded7 --- /dev/null +++ b/CHANGELOG.md @@ -0,0 +1,16 @@ +# KDL Changelog + +## 2.0.0 (2022-08-28) + +### Grammar + +* Solidus/Forward slash (`/`) is no longer an escaped character. +* Single line comments (`//`) can now be immediately followed by a newline. + +### KQL + +* There's now a _required_ descendant selector (`>>`), instead of using plain + spaces for that purpose. +* The "any sibling" selector is now `++` instead of `~`, for consistency with + the new descendant selector. +* Map operators have been removed entirely. From f38edc765d4d35238cbef0991153765ae84f337e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Sun, 28 Aug 2022 13:28:47 -0700 Subject: [PATCH 06/15] add failing test for removed solidus escape --- tests/test_cases/input/no_solidus_escape.kdl | 1 + 1 file changed, 1 insertion(+) create mode 100644 tests/test_cases/input/no_solidus_escape.kdl diff --git a/tests/test_cases/input/no_solidus_escape.kdl b/tests/test_cases/input/no_solidus_escape.kdl new file mode 100644 index 0000000..5702080 --- /dev/null +++ b/tests/test_cases/input/no_solidus_escape.kdl @@ -0,0 +1 @@ +node "\\" From ffeea8e5aa86edba2ac3fba40ce96755a152d8c6 Mon Sep 17 00:00:00 2001 From: Bram Gotink Date: Tue, 30 Aug 2022 17:11:51 +0200 Subject: [PATCH 07/15] Use forward slash in solidus-escape test (#288) --- tests/test_cases/input/no_solidus_escape.kdl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/test_cases/input/no_solidus_escape.kdl b/tests/test_cases/input/no_solidus_escape.kdl index 5702080..2dbc2d1 100644 --- a/tests/test_cases/input/no_solidus_escape.kdl +++ b/tests/test_cases/input/no_solidus_escape.kdl @@ -1 +1 @@ -node "\\" +node "\/" From 337bd1bccf2e0fb141e5d4e41ff58eb97d05f892 Mon Sep 17 00:00:00 2001 From: Bram Gotink Date: Tue, 30 Aug 2022 19:44:44 +0200 Subject: [PATCH 08/15] Update expected output of test with changed input (#289) --- tests/test_cases/expected_kdl/all_escapes.kdl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/test_cases/expected_kdl/all_escapes.kdl b/tests/test_cases/expected_kdl/all_escapes.kdl index c25f434..024cda2 100644 --- a/tests/test_cases/expected_kdl/all_escapes.kdl +++ b/tests/test_cases/expected_kdl/all_escapes.kdl @@ -1 +1 @@ -node "\"\\/\b\f\n\r\t" +node "\"\\\b\f\n\r\t" From 825ff2c17d201688331afc020751b3c9de6de3e4 Mon Sep 17 00:00:00 2001 From: Nathan West Date: Thu, 1 Sep 2022 00:49:01 -0400 Subject: [PATCH 09/15] Add escaped whitespace to KDL strings (#290) * Add escaped whitespace to KDL spec * Add test cases for escaped whitespace * Spelling error --- SPEC.md | 33 ++++++++++++++++++- .../expected_kdl/escaped_whitespace.kdl | 1 + tests/test_cases/input/escaped_whitespace.kdl | 15 +++++++++ 3 files changed, 48 insertions(+), 1 deletion(-) create mode 100644 tests/test_cases/expected_kdl/escaped_whitespace.kdl create mode 100644 tests/test_cases/input/escaped_whitespace.kdl diff --git a/SPEC.md b/SPEC.md index 01d7570..cfeac86 100644 --- a/SPEC.md +++ b/SPEC.md @@ -309,6 +309,8 @@ String Value can encompass multiple lines without behaving like a Newline for Strings _MUST_ be represented as UTF-8 values. +#### Escapes + In addition to literal code points, a number of "escapes" are supported. "Escapes" are the character `\` followed by another character, and are interpreted as described in the following table: @@ -323,6 +325,35 @@ interpreted as described in the following table: | Backspace | `\b` | `U+0008` | | Form Feed | `\f` | `U+000C` | | Unicode Escape | `\u{(1-6 hex chars)}` | Code point described by hex characters, up to `10FFFF` | +| Whitespace Escape | See below | N/A | + +##### Escaped Whitespace + +In addition to escaping individual characters, `\` can also escape whitespace. +When a `\` is followed by one or more literal whitespace characters, the `\` +and all of that whitespace are discarded. For example, `"Hello World"` and +`"Hello \ World"` are semantically identical. See [whitespace](#whitespace) +and [newlines](#newlines) for how whitespace is defined. + +Note that only literal whitespace is escaped; *escaped* whitespace is retained. +For example, these strings are all semantically identical: + +```kdl +"Hello\ \nWorld" + + "Hello\n\ + World" + +"Hello\nWorld" + +"Hello +World" +``` + +##### Invalid escapes + +Except as described in the escapes table, above, `\` *MUST NOT* precede any +other characters in a string. ### Raw String @@ -460,7 +491,7 @@ type := '(' identifier ')' string := raw-string | escaped-string escaped-string := '"' character* '"' character := '\' escape | [^\"] -escape := ["\\bfnrt] | 'u{' hex-digit{1, 6} '}' +escape := ["\\bfnrt] | 'u{' hex-digit{1, 6} '}' | (unicode-space | newline)+ hex-digit := [0-9a-fA-F] raw-string := 'r' raw-string-hash diff --git a/tests/test_cases/expected_kdl/escaped_whitespace.kdl b/tests/test_cases/expected_kdl/escaped_whitespace.kdl new file mode 100644 index 0000000..a97d10a --- /dev/null +++ b/tests/test_cases/expected_kdl/escaped_whitespace.kdl @@ -0,0 +1 @@ +node "Hello\n\tWorld" "Hello\n\tWorld" "Hello\n\tWorld" "Hello\n\tWorld" "Hello\n\tWorld" "Hello\n\tWorld" diff --git a/tests/test_cases/input/escaped_whitespace.kdl b/tests/test_cases/input/escaped_whitespace.kdl new file mode 100644 index 0000000..1f2e67c --- /dev/null +++ b/tests/test_cases/input/escaped_whitespace.kdl @@ -0,0 +1,15 @@ +// All of these strings are the same +node \ + "Hello\n\tWorld" \ + "Hello + World" \ + "Hello\n\ \tWorld" \ + "Hello\n\ + \tWorld" \ + "Hello +\ \tWorld" \ + "Hello\n\t\ + World" + +// Note that this file deliberately mixes space and newline indentation for +// test purposes From 0a4a14d87a4f87fb3fb424d23e021aa6df17d346 Mon Sep 17 00:00:00 2001 From: Hannah Kolbeck Date: Thu, 1 Sep 2022 13:05:53 -0700 Subject: [PATCH 10/15] Add escaped whitespace note to v2 changelog (#291) * Add escaped whitespace note to v2 changelog * Make changelog note on escaping whitespace more detailed --- CHANGELOG.md | 1 + 1 file changed, 1 insertion(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index 6f9ded7..aa97ac5 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -6,6 +6,7 @@ * Solidus/Forward slash (`/`) is no longer an escaped character. * Single line comments (`//`) can now be immediately followed by a newline. +* All literal whitespace following a `\` in a string is now discarded. ### KQL From d437cf228b62cf91263b81af590f823ed46ef5c3 Mon Sep 17 00:00:00 2001 From: Bram Gotink Date: Fri, 2 Sep 2022 16:37:10 +0200 Subject: [PATCH 11/15] Add test for empty single-line comment (#292) --- tests/test_cases/expected_kdl/empty_line_comment.kdl | 1 + tests/test_cases/input/empty_line_comment.kdl | 2 ++ 2 files changed, 3 insertions(+) create mode 100644 tests/test_cases/expected_kdl/empty_line_comment.kdl create mode 100644 tests/test_cases/input/empty_line_comment.kdl diff --git a/tests/test_cases/expected_kdl/empty_line_comment.kdl b/tests/test_cases/expected_kdl/empty_line_comment.kdl new file mode 100644 index 0000000..64f5a0a --- /dev/null +++ b/tests/test_cases/expected_kdl/empty_line_comment.kdl @@ -0,0 +1 @@ +node diff --git a/tests/test_cases/input/empty_line_comment.kdl b/tests/test_cases/input/empty_line_comment.kdl new file mode 100644 index 0000000..e62ef84 --- /dev/null +++ b/tests/test_cases/input/empty_line_comment.kdl @@ -0,0 +1,2 @@ +// +node \ No newline at end of file From 06d1d67359e1be070050922cd202f053039e6171 Mon Sep 17 00:00:00 2001 From: Lars Willighagen Date: Sun, 9 Oct 2022 21:04:10 +0200 Subject: [PATCH 12/15] Add draft grammar for KQL 1.0.0 (#303) * Add draft grammar for KQL 1.0.0 * Change whitespace in KQL grammar * Update KQL grammar to use new operators --- QUERY-SPEC.md | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/QUERY-SPEC.md b/QUERY-SPEC.md index 7ab1b46..bf918e7 100644 --- a/QUERY-SPEC.md +++ b/QUERY-SPEC.md @@ -101,3 +101,21 @@ Then the following queries are valid: * `dependencies > []` * -> fetches all direct-child nodes of any `dependencies` nodes in the document. In this case, it will fetch both `miette` and `winapi` nodes. + +## Full Grammar + +For rules that are not defined in this grammar, see [the KDL grammar](https://github.com/kdl-org/kdl/blob/main/SPEC.md#full-grammar). + +``` +query := selector q-ws* "||" q-ws* query | selector +selector := filter q-ws* selector-operator q-ws* selector | filter +selector-operator := ">>" | ">" | "++" | "+" +filter := matcher+ +matcher := "top()"| "()" | identifier | type | accessor-matcher +accessor-matcher := "[" (comparison | accessor)? "]" +comparison := accessor q-ws* matcher-operator q-ws* (type | string | number | keyword) +accessor := "val(" number ")" | "prop(" identifier ")" | "name()" | "tag()" | "values()" | "props()" | identifier +matcher-operator := "=" | "!=" | ">" | "<" | ">=" | "<=" | "^=" | "$=" | "*=" + +q-ws := bom | unicode-space +``` From 3b39e29feecabe80af70a765e596673ce761cb29 Mon Sep 17 00:00:00 2001 From: Tab Atkins-Bittner Date: Fri, 6 Oct 2023 14:13:43 -0700 Subject: [PATCH 13/15] Add vertical tab to whitespace. Closes #331 --- SPEC.md | 1 + 1 file changed, 1 insertion(+) diff --git a/SPEC.md b/SPEC.md index cfeac86..55a4d1a 100644 --- a/SPEC.md +++ b/SPEC.md @@ -427,6 +427,7 @@ space](https://www.unicode.org/Public/UCD/latest/ucd/PropList.txt): | Name | Code Pt | |----------------------|---------| | Character Tabulation | `U+0009` | +| Line Tabulation | `U+000B` | | Space | `U+0020` | | No-Break Space | `U+00A0` | | Ogham Space Mark | `U+1680` | From 568c096465693d3a4bd58e853bdb9c335b135f63 Mon Sep 17 00:00:00 2001 From: Tab Atkins-Bittner Date: Fri, 6 Oct 2023 14:30:18 -0700 Subject: [PATCH 14/15] Document the vertical tab addition. --- CHANGELOG.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index aa97ac5..a3bc032 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -6,7 +6,8 @@ * Solidus/Forward slash (`/`) is no longer an escaped character. * Single line comments (`//`) can now be immediately followed by a newline. -* All literal whitespace following a `\` in a string is now discarded. +* All literal whitespace following a `\` in a string is now discarded. +* Vertical tabs (`U+000B`) are now considered to be whitespace. ### KQL From 0836df1c192e9586bb6b54795ebd69cbeb127715 Mon Sep 17 00:00:00 2001 From: Tab Atkins-Bittner Date: Fri, 6 Oct 2023 14:32:01 -0700 Subject: [PATCH 15/15] Restrict idents from looking like raw strings. Closes #200, closes #204, closes #241 --- CHANGELOG.md | 1 + SPEC.md | 5 ++++- 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index a3bc032..cd30307 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,6 +8,7 @@ * Single line comments (`//`) can now be immediately followed by a newline. * All literal whitespace following a `\` in a string is now discarded. * Vertical tabs (`U+000B`) are now considered to be whitespace. +* Identifiers can't start with `r#`, so they're easy to distinguish from raw strings. (They already similarly can't start with a digit, or a sign+digit, so they're easy to distinguish from numbers.) ### KQL diff --git a/SPEC.md b/SPEC.md index 55a4d1a..cbd90c7 100644 --- a/SPEC.md +++ b/SPEC.md @@ -482,7 +482,10 @@ node-space := ws* escline ws* | ws+ node-terminator := single-line-comment | newline | ';' | eof identifier := string | bare-identifier -bare-identifier := ((identifier-char - digit - sign) identifier-char* | sign ((identifier-char - digit) identifier-char*)?) - keyword +bare-identifier := (unambiguous-ident | numberish-ident | stringish-ident) - keyword +unambiguous-ident := (identifier-char - digit - sign - "r") identifier-char* +numberish-ident := sign ((identifier-char - digit) identifier-char*)? +stringish-ident := "r" ((identifier-char - "#") identifier-char*)? identifier-char := unicode - linespace - [\/(){}<>;[]=,"] keyword := boolean | 'null' prop := identifier '=' value