Merge branch 'kdl-v2' into patch-2

This commit is contained in:
Kat Marchán 2023-12-10 17:29:20 -08:00 committed by GitHub
commit 8ccbc92fed
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
10 changed files with 104 additions and 70 deletions

19
CHANGELOG.md Normal file
View File

@ -0,0 +1,19 @@
# KDL Changelog
## 2.0.0 (2022-08-28)
### Grammar
* Solidus/Forward slash (`/`) is no longer an escaped character.
* Single line comments (`//`) can now be immediately followed by a newline.
* All literal whitespace following a `\` in a string is now discarded.
* Vertical tabs (`U+000B`) are now considered to be whitespace.
* Identifiers can't start with `r#`, so they're easy to distinguish from raw strings. (They already similarly can't start with a digit, or a sign+digit, so they're easy to distinguish from numbers.)
### KQL
* There's now a _required_ descendant selector (`>>`), instead of using plain
spaces for that purpose.
* The "any sibling" selector is now `++` instead of `~`, for consistency with
the new descendant selector.
* Map operators have been removed entirely.

View File

@ -5,20 +5,20 @@ documents to extract nodes and even specific data. It is loosely based on CSS
selectors for familiarity and ease of use. Think of it as CSS Selectors or selectors for familiarity and ease of use. Think of it as CSS Selectors or
XPath, but for KDL! XPath, but for KDL!
This document describes KQL `1.0.0`. It was released on September 11, 2021. This document describes KQL `next`. It is unreleased.
## Selectors ## Selectors
Selectors use selection operators to filter nodes that will be returned by an Selectors use selection operators to filter nodes that will be returned by an
API using KQL. The main differences between this and CSS selectors are the API using KQL. The main differences between this and CSS selectors are the
lack of `*` (use `[]` instead), and the specific syntax for lack of `*` (use `[]` instead), the specific syntax for descendants and siblings, and the specific syntax for
[matchers](#matchers) (the stuff between `[` and `]`), which is similar, but not identical to CSS. [matchers](#matchers) (the stuff between `[` and `]`), which is similar, but not identical to CSS.
* `a > b`: Selects any `b` element that is a direct child of an `a` element. * `a > b`: Selects any `b` element that is a direct child of an `a` element.
* `a b`: Selects any `b` element that is a _descendant_ of an `a` element. * `a >> b`: Selects any `b` element that is a _descendant_ of an `a` element.
* `a b || a c`: Selects all `b` and `c` elements that are descendants of an `a` element. Any selector may be on either side of the `||`. Multiple `||` are supported. * `a >> b || a >> c`: Selects all `b` and `c` elements that are descendants of an `a` element. Any selector may be on either side of the `||`. Multiple `||` are supported.
* `a + b`: Selects any `b` element that is placed immediately after a sibling `a` element. * `a + b`: Selects any `b` element that is placed immediately after a sibling `a` element.
* `a ~ b`: Selects any `b` element that follows an `a` element as a sibling, either immediately or later. * `a ++ b`: Selects any `b` element that follows an `a` element as a sibling, either immediately or later.
* `[accessor()]`: Selects any element, filtered by [an accessor](#accessors). (`accessor()` is a placeholder, not an actual accessor) * `[accessor()]`: Selects any element, filtered by [an accessor](#accessors). (`accessor()` is a placeholder, not an actual accessor)
* `a[accessor()]`: Selects any `a` element, filtered by an accessor. * `a[accessor()]`: Selects any `a` element, filtered by an accessor.
* `[]`: Selects any element. * `[]`: Selects any element.
@ -69,33 +69,6 @@ is not one of those, the matcher will always fail:
* `[val() = (foo)]`: Selects any element whose tag is "foo". * `[val() = (foo)]`: Selects any element whose tag is "foo".
## Map Operator
KQL implementations MAY support a "map operator", `=>`, that allows selection
of specific parts of the selected notes, essentially "mapping" over a
selector's result set.
Only a single map operator may be used, and it must be the last element in a
selector string.
The map operator's right hand side is either an [`accessor`](#accessors) on
its own, or a tuple of accessors, denoted by a comma-separated list wrapped in
`()` (for example, `(a, b, c)`).
## Accessors
Accessors access/extract specific parts of a node. They are used with the [map
operator](#map-operator), and have syntactic overlap with some
[matchers](#matchers).
* `name()`: Returns the name of the node itself.
* `val(2)`: Returns the third value in a node.
* `val()`: Equivalent to `val(0)`.
* `prop(foo)`: Returns the value of the property `foo` in the node.
* `foo`: Equivalent to `prop(foo)`.
* `props()`: Returns all properties of the node as an object.
* `values()`: Returns all values of the node as an array.
## Examples ## Examples
Given this document: Given this document:
@ -108,16 +81,16 @@ package {
winapi "1.0.0" path="./crates/my-winapi-fork" winapi "1.0.0" path="./crates/my-winapi-fork"
} }
dependencies { dependencies {
miette "2.0.0" dev=true miette "2.0.0" dev=true integrity=(sri)"sha512-deadbeef"
} }
} }
``` ```
Then the following queries are valid: Then the following queries are valid:
* `package name` * `package >> name`
* -> fetches the `name` node itself * -> fetches the `name` node itself
* `top() > package name` * `top() > package >> name`
* -> fetches the `name` node, guaranteeing that `package` is in the document root. * -> fetches the `name` node, guaranteeing that `package` is in the document root.
* `dependencies` * `dependencies`
* -> deep-fetches both `dependencies` nodes * -> deep-fetches both `dependencies` nodes
@ -129,14 +102,20 @@ Then the following queries are valid:
* -> fetches all direct-child nodes of any `dependencies` nodes in the * -> fetches all direct-child nodes of any `dependencies` nodes in the
document. In this case, it will fetch both `miette` and `winapi` nodes. document. In this case, it will fetch both `miette` and `winapi` nodes.
If using an API that supports the [map operator](#map-operator), the following ## Full Grammar
are valid queries:
* `package name => val()` For rules that are not defined in this grammar, see [the KDL grammar](https://github.com/kdl-org/kdl/blob/main/SPEC.md#full-grammar).
* -> `["foo"]`.
* `dependencies[platform] => platform` ```
* -> `["windows"]` query := selector q-ws* "||" q-ws* query | selector
* `dependencies > [] => (name(), val(), path)` selector := filter q-ws* selector-operator q-ws* selector | filter
* -> `[("winapi", "1.0.0", "./crates/my-winapi-fork"), ("miette", "2.0.0", None)]` selector-operator := ">>" | ">" | "++" | "+"
* `dependencies > [] => (name(), values(), props())` filter := matcher+
* -> `[("winapi", ["1.0.0"], {"platform": "windows"}), ("miette", ["2.0.0"], {"dev": true})]` matcher := "top()"| "()" | identifier | type | accessor-matcher
accessor-matcher := "[" (comparison | accessor)? "]"
comparison := accessor q-ws* matcher-operator q-ws* (type | string | number | keyword)
accessor := "val(" number ")" | "prop(" identifier ")" | "name()" | "tag()" | "values()" | "props()" | identifier
matcher-operator := "=" | "!=" | ">" | "<" | ">=" | "<=" | "^=" | "$=" | "*="
q-ws := bom | unicode-space
```

62
SPEC.md
View File

@ -3,9 +3,7 @@
This is the semi-formal specification for KDL, including the intended data This is the semi-formal specification for KDL, including the intended data
model and the grammar. model and the grammar.
This document describes KDL version `2.0.0-preview`. This document describes KDL version `1.0.0`. It was released on September 11, 2021.
KDL version `1.0.0` was released on September 11, 2021.
## Introduction ## Introduction
@ -26,22 +24,6 @@ the directions if the data stream were only ASCII text. They do not refer
to the writing direction of text, which can flow in either direction, to the writing direction of text, which can flow in either direction,
depending on the characters used. depending on the characters used.
## Changes from version `1.0.0`
### Relaxed
- The way that `/-` comments are handled has changed. Now, `/-` comments are
consistently treated like whitespace. Notably, this means that `/-` children
blocks do not prevent the presence of later arguments, properties, or children
blocks on the attached node.
### Constrained
- Previously, whitespace was not required before a children block, i.e. `node{}`
was valid. Now, whitespace is required before a children block, the same as
before arguments and properties.
- `/-` comments on nodes must also be separated by plain (non-`/-`) whitespace.
## Components ## Components
### Document ### Document
@ -327,6 +309,8 @@ String Value can encompass multiple lines without behaving like a Newline for
Strings _MUST_ be represented as UTF-8 values. Strings _MUST_ be represented as UTF-8 values.
#### Escapes
In addition to literal code points, a number of "escapes" are supported. In addition to literal code points, a number of "escapes" are supported.
"Escapes" are the character `\` followed by another character, and are "Escapes" are the character `\` followed by another character, and are
interpreted as described in the following table: interpreted as described in the following table:
@ -337,11 +321,39 @@ interpreted as described in the following table:
| Carriage Return | `\r` | `U+000D` | | Carriage Return | `\r` | `U+000D` |
| Character Tabulation (Tab) | `\t` | `U+0009` | | Character Tabulation (Tab) | `\t` | `U+0009` |
| Reverse Solidus (Backslash) | `\\` | `U+005C` | | Reverse Solidus (Backslash) | `\\` | `U+005C` |
| Solidus (Forwardslash) | `\/` | `U+002F` |
| Quotation Mark (Double Quote) | `\"` | `U+0022` | | Quotation Mark (Double Quote) | `\"` | `U+0022` |
| Backspace | `\b` | `U+0008` | | Backspace | `\b` | `U+0008` |
| Form Feed | `\f` | `U+000C` | | Form Feed | `\f` | `U+000C` |
| Unicode Escape | `\u{(1-6 hex chars)}` | Code point described by hex characters, up to `10FFFF` | | Unicode Escape | `\u{(1-6 hex chars)}` | Code point described by hex characters, up to `10FFFF` |
| Whitespace Escape | See below | N/A |
##### Escaped Whitespace
In addition to escaping individual characters, `\` can also escape whitespace.
When a `\` is followed by one or more literal whitespace characters, the `\`
and all of that whitespace are discarded. For example, `"Hello World"` and
`"Hello \ World"` are semantically identical. See [whitespace](#whitespace)
and [newlines](#newlines) for how whitespace is defined.
Note that only literal whitespace is escaped; *escaped* whitespace is retained.
For example, these strings are all semantically identical:
```kdl
"Hello\ \nWorld"
"Hello\n\
World"
"Hello\nWorld"
"Hello
World"
```
##### Invalid escapes
Except as described in the escapes table, above, `\` *MUST NOT* precede any
other characters in a string.
### Raw String ### Raw String
@ -415,6 +427,7 @@ space](https://www.unicode.org/Public/UCD/latest/ucd/PropList.txt):
| Name | Code Pt | | Name | Code Pt |
|----------------------|---------| |----------------------|---------|
| Character Tabulation | `U+0009` | | Character Tabulation | `U+0009` |
| Line Tabulation | `U+000B` |
| Space | `U+0020` | | Space | `U+0020` |
| No-Break Space | `U+00A0` | | No-Break Space | `U+00A0` |
| Ogham Space Mark | `U+1680` | | Ogham Space Mark | `U+1680` |
@ -477,7 +490,10 @@ node-children := '{' nodes '}'
node-terminator := single-line-comment | newline | ';' | eof node-terminator := single-line-comment | newline | ';' | eof
identifier := string | bare-identifier identifier := string | bare-identifier
bare-identifier := ((identifier-char - digit - sign) identifier-char* | sign ((identifier-char - digit) identifier-char*)?) - keyword bare-identifier := (unambiguous-ident | numberish-ident | stringish-ident) - keyword
unambiguous-ident := (identifier-char - digit - sign - "r") identifier-char*
numberish-ident := sign ((identifier-char - digit) identifier-char*)?
stringish-ident := "r" ((identifier-char - "#") identifier-char*)?
identifier-char := unicode - line-space - [\/(){}<>;[]=,"] identifier-char := unicode - line-space - [\/(){}<>;[]=,"]
keyword := boolean | 'null' keyword := boolean | 'null'
prop := identifier '=' value prop := identifier '=' value
@ -487,7 +503,7 @@ type := '(' identifier ')'
string := raw-string | escaped-string string := raw-string | escaped-string
escaped-string := '"' character* '"' escaped-string := '"' character* '"'
character := '\' escape | [^\"] character := '\' escape | [^\"]
escape := ["\\/bfnrt] | 'u{' hex-digit{1, 6} '}' escape := ["\\bfnrt] | 'u{' hex-digit{1, 6} '}' | (unicode-space | newline)+
hex-digit := [0-9a-fA-F] hex-digit := [0-9a-fA-F]
raw-string := 'r' raw-string-hash raw-string := 'r' raw-string-hash
@ -518,7 +534,7 @@ bom := '\u{FEFF}'
unicode-space := See Table (All White_Space unicode characters which are not `newline`) unicode-space := See Table (All White_Space unicode characters which are not `newline`)
single-line-comment := '//' ^newline+ (newline | eof) single-line-comment := '//' ^newline* (newline | eof)
multi-line-comment := '/*' commented-block multi-line-comment := '/*' commented-block
commented-block := '*/' | (multi-line-comment | '*' | '/' | [^*/]+) commented-block commented-block := '*/' | (multi-line-comment | '*' | '/' | [^*/]+) commented-block
``` ```

View File

@ -1 +1 @@
node "\"\\/\b\f\n\r\t" node "\"\\\b\f\n\r\t"

View File

@ -0,0 +1 @@
node

View File

@ -0,0 +1 @@
node "Hello\n\tWorld" "Hello\n\tWorld" "Hello\n\tWorld" "Hello\n\tWorld" "Hello\n\tWorld" "Hello\n\tWorld"

View File

@ -1 +1 @@
node "\"\\\/\b\f\n\r\t" node "\"\\\b\f\n\r\t"

View File

@ -0,0 +1,2 @@
//
node

View File

@ -0,0 +1,15 @@
// All of these strings are the same
node \
"Hello\n\tWorld" \
"Hello
World" \
"Hello\n\ \tWorld" \
"Hello\n\
\tWorld" \
"Hello
\ \tWorld" \
"Hello\n\t\
World"
// Note that this file deliberately mixes space and newline indentation for
// test purposes

View File

@ -0,0 +1 @@
node "\/"