mirror of https://github.com/kdl-org/kdl.git
Merge branch 'kdl-v2' into patch-2
This commit is contained in:
commit
8ccbc92fed
|
|
@ -0,0 +1,19 @@
|
||||||
|
# KDL Changelog
|
||||||
|
|
||||||
|
## 2.0.0 (2022-08-28)
|
||||||
|
|
||||||
|
### Grammar
|
||||||
|
|
||||||
|
* Solidus/Forward slash (`/`) is no longer an escaped character.
|
||||||
|
* Single line comments (`//`) can now be immediately followed by a newline.
|
||||||
|
* All literal whitespace following a `\` in a string is now discarded.
|
||||||
|
* Vertical tabs (`U+000B`) are now considered to be whitespace.
|
||||||
|
* Identifiers can't start with `r#`, so they're easy to distinguish from raw strings. (They already similarly can't start with a digit, or a sign+digit, so they're easy to distinguish from numbers.)
|
||||||
|
|
||||||
|
### KQL
|
||||||
|
|
||||||
|
* There's now a _required_ descendant selector (`>>`), instead of using plain
|
||||||
|
spaces for that purpose.
|
||||||
|
* The "any sibling" selector is now `++` instead of `~`, for consistency with
|
||||||
|
the new descendant selector.
|
||||||
|
* Map operators have been removed entirely.
|
||||||
|
|
@ -5,20 +5,20 @@ documents to extract nodes and even specific data. It is loosely based on CSS
|
||||||
selectors for familiarity and ease of use. Think of it as CSS Selectors or
|
selectors for familiarity and ease of use. Think of it as CSS Selectors or
|
||||||
XPath, but for KDL!
|
XPath, but for KDL!
|
||||||
|
|
||||||
This document describes KQL `1.0.0`. It was released on September 11, 2021.
|
This document describes KQL `next`. It is unreleased.
|
||||||
|
|
||||||
## Selectors
|
## Selectors
|
||||||
|
|
||||||
Selectors use selection operators to filter nodes that will be returned by an
|
Selectors use selection operators to filter nodes that will be returned by an
|
||||||
API using KQL. The main differences between this and CSS selectors are the
|
API using KQL. The main differences between this and CSS selectors are the
|
||||||
lack of `*` (use `[]` instead), and the specific syntax for
|
lack of `*` (use `[]` instead), the specific syntax for descendants and siblings, and the specific syntax for
|
||||||
[matchers](#matchers) (the stuff between `[` and `]`), which is similar, but not identical to CSS.
|
[matchers](#matchers) (the stuff between `[` and `]`), which is similar, but not identical to CSS.
|
||||||
|
|
||||||
* `a > b`: Selects any `b` element that is a direct child of an `a` element.
|
* `a > b`: Selects any `b` element that is a direct child of an `a` element.
|
||||||
* `a b`: Selects any `b` element that is a _descendant_ of an `a` element.
|
* `a >> b`: Selects any `b` element that is a _descendant_ of an `a` element.
|
||||||
* `a b || a c`: Selects all `b` and `c` elements that are descendants of an `a` element. Any selector may be on either side of the `||`. Multiple `||` are supported.
|
* `a >> b || a >> c`: Selects all `b` and `c` elements that are descendants of an `a` element. Any selector may be on either side of the `||`. Multiple `||` are supported.
|
||||||
* `a + b`: Selects any `b` element that is placed immediately after a sibling `a` element.
|
* `a + b`: Selects any `b` element that is placed immediately after a sibling `a` element.
|
||||||
* `a ~ b`: Selects any `b` element that follows an `a` element as a sibling, either immediately or later.
|
* `a ++ b`: Selects any `b` element that follows an `a` element as a sibling, either immediately or later.
|
||||||
* `[accessor()]`: Selects any element, filtered by [an accessor](#accessors). (`accessor()` is a placeholder, not an actual accessor)
|
* `[accessor()]`: Selects any element, filtered by [an accessor](#accessors). (`accessor()` is a placeholder, not an actual accessor)
|
||||||
* `a[accessor()]`: Selects any `a` element, filtered by an accessor.
|
* `a[accessor()]`: Selects any `a` element, filtered by an accessor.
|
||||||
* `[]`: Selects any element.
|
* `[]`: Selects any element.
|
||||||
|
|
@ -69,33 +69,6 @@ is not one of those, the matcher will always fail:
|
||||||
|
|
||||||
* `[val() = (foo)]`: Selects any element whose tag is "foo".
|
* `[val() = (foo)]`: Selects any element whose tag is "foo".
|
||||||
|
|
||||||
## Map Operator
|
|
||||||
|
|
||||||
KQL implementations MAY support a "map operator", `=>`, that allows selection
|
|
||||||
of specific parts of the selected notes, essentially "mapping" over a
|
|
||||||
selector's result set.
|
|
||||||
|
|
||||||
Only a single map operator may be used, and it must be the last element in a
|
|
||||||
selector string.
|
|
||||||
|
|
||||||
The map operator's right hand side is either an [`accessor`](#accessors) on
|
|
||||||
its own, or a tuple of accessors, denoted by a comma-separated list wrapped in
|
|
||||||
`()` (for example, `(a, b, c)`).
|
|
||||||
|
|
||||||
## Accessors
|
|
||||||
|
|
||||||
Accessors access/extract specific parts of a node. They are used with the [map
|
|
||||||
operator](#map-operator), and have syntactic overlap with some
|
|
||||||
[matchers](#matchers).
|
|
||||||
|
|
||||||
* `name()`: Returns the name of the node itself.
|
|
||||||
* `val(2)`: Returns the third value in a node.
|
|
||||||
* `val()`: Equivalent to `val(0)`.
|
|
||||||
* `prop(foo)`: Returns the value of the property `foo` in the node.
|
|
||||||
* `foo`: Equivalent to `prop(foo)`.
|
|
||||||
* `props()`: Returns all properties of the node as an object.
|
|
||||||
* `values()`: Returns all values of the node as an array.
|
|
||||||
|
|
||||||
## Examples
|
## Examples
|
||||||
|
|
||||||
Given this document:
|
Given this document:
|
||||||
|
|
@ -108,16 +81,16 @@ package {
|
||||||
winapi "1.0.0" path="./crates/my-winapi-fork"
|
winapi "1.0.0" path="./crates/my-winapi-fork"
|
||||||
}
|
}
|
||||||
dependencies {
|
dependencies {
|
||||||
miette "2.0.0" dev=true
|
miette "2.0.0" dev=true integrity=(sri)"sha512-deadbeef"
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
Then the following queries are valid:
|
Then the following queries are valid:
|
||||||
|
|
||||||
* `package name`
|
* `package >> name`
|
||||||
* -> fetches the `name` node itself
|
* -> fetches the `name` node itself
|
||||||
* `top() > package name`
|
* `top() > package >> name`
|
||||||
* -> fetches the `name` node, guaranteeing that `package` is in the document root.
|
* -> fetches the `name` node, guaranteeing that `package` is in the document root.
|
||||||
* `dependencies`
|
* `dependencies`
|
||||||
* -> deep-fetches both `dependencies` nodes
|
* -> deep-fetches both `dependencies` nodes
|
||||||
|
|
@ -129,14 +102,20 @@ Then the following queries are valid:
|
||||||
* -> fetches all direct-child nodes of any `dependencies` nodes in the
|
* -> fetches all direct-child nodes of any `dependencies` nodes in the
|
||||||
document. In this case, it will fetch both `miette` and `winapi` nodes.
|
document. In this case, it will fetch both `miette` and `winapi` nodes.
|
||||||
|
|
||||||
If using an API that supports the [map operator](#map-operator), the following
|
## Full Grammar
|
||||||
are valid queries:
|
|
||||||
|
|
||||||
* `package name => val()`
|
For rules that are not defined in this grammar, see [the KDL grammar](https://github.com/kdl-org/kdl/blob/main/SPEC.md#full-grammar).
|
||||||
* -> `["foo"]`.
|
|
||||||
* `dependencies[platform] => platform`
|
```
|
||||||
* -> `["windows"]`
|
query := selector q-ws* "||" q-ws* query | selector
|
||||||
* `dependencies > [] => (name(), val(), path)`
|
selector := filter q-ws* selector-operator q-ws* selector | filter
|
||||||
* -> `[("winapi", "1.0.0", "./crates/my-winapi-fork"), ("miette", "2.0.0", None)]`
|
selector-operator := ">>" | ">" | "++" | "+"
|
||||||
* `dependencies > [] => (name(), values(), props())`
|
filter := matcher+
|
||||||
* -> `[("winapi", ["1.0.0"], {"platform": "windows"}), ("miette", ["2.0.0"], {"dev": true})]`
|
matcher := "top()"| "()" | identifier | type | accessor-matcher
|
||||||
|
accessor-matcher := "[" (comparison | accessor)? "]"
|
||||||
|
comparison := accessor q-ws* matcher-operator q-ws* (type | string | number | keyword)
|
||||||
|
accessor := "val(" number ")" | "prop(" identifier ")" | "name()" | "tag()" | "values()" | "props()" | identifier
|
||||||
|
matcher-operator := "=" | "!=" | ">" | "<" | ">=" | "<=" | "^=" | "$=" | "*="
|
||||||
|
|
||||||
|
q-ws := bom | unicode-space
|
||||||
|
```
|
||||||
|
|
|
||||||
62
SPEC.md
62
SPEC.md
|
|
@ -3,9 +3,7 @@
|
||||||
This is the semi-formal specification for KDL, including the intended data
|
This is the semi-formal specification for KDL, including the intended data
|
||||||
model and the grammar.
|
model and the grammar.
|
||||||
|
|
||||||
This document describes KDL version `2.0.0-preview`.
|
This document describes KDL version `1.0.0`. It was released on September 11, 2021.
|
||||||
|
|
||||||
KDL version `1.0.0` was released on September 11, 2021.
|
|
||||||
|
|
||||||
## Introduction
|
## Introduction
|
||||||
|
|
||||||
|
|
@ -26,22 +24,6 @@ the directions if the data stream were only ASCII text. They do not refer
|
||||||
to the writing direction of text, which can flow in either direction,
|
to the writing direction of text, which can flow in either direction,
|
||||||
depending on the characters used.
|
depending on the characters used.
|
||||||
|
|
||||||
## Changes from version `1.0.0`
|
|
||||||
|
|
||||||
### Relaxed
|
|
||||||
|
|
||||||
- The way that `/-` comments are handled has changed. Now, `/-` comments are
|
|
||||||
consistently treated like whitespace. Notably, this means that `/-` children
|
|
||||||
blocks do not prevent the presence of later arguments, properties, or children
|
|
||||||
blocks on the attached node.
|
|
||||||
|
|
||||||
### Constrained
|
|
||||||
|
|
||||||
- Previously, whitespace was not required before a children block, i.e. `node{}`
|
|
||||||
was valid. Now, whitespace is required before a children block, the same as
|
|
||||||
before arguments and properties.
|
|
||||||
- `/-` comments on nodes must also be separated by plain (non-`/-`) whitespace.
|
|
||||||
|
|
||||||
## Components
|
## Components
|
||||||
|
|
||||||
### Document
|
### Document
|
||||||
|
|
@ -327,6 +309,8 @@ String Value can encompass multiple lines without behaving like a Newline for
|
||||||
|
|
||||||
Strings _MUST_ be represented as UTF-8 values.
|
Strings _MUST_ be represented as UTF-8 values.
|
||||||
|
|
||||||
|
#### Escapes
|
||||||
|
|
||||||
In addition to literal code points, a number of "escapes" are supported.
|
In addition to literal code points, a number of "escapes" are supported.
|
||||||
"Escapes" are the character `\` followed by another character, and are
|
"Escapes" are the character `\` followed by another character, and are
|
||||||
interpreted as described in the following table:
|
interpreted as described in the following table:
|
||||||
|
|
@ -337,11 +321,39 @@ interpreted as described in the following table:
|
||||||
| Carriage Return | `\r` | `U+000D` |
|
| Carriage Return | `\r` | `U+000D` |
|
||||||
| Character Tabulation (Tab) | `\t` | `U+0009` |
|
| Character Tabulation (Tab) | `\t` | `U+0009` |
|
||||||
| Reverse Solidus (Backslash) | `\\` | `U+005C` |
|
| Reverse Solidus (Backslash) | `\\` | `U+005C` |
|
||||||
| Solidus (Forwardslash) | `\/` | `U+002F` |
|
|
||||||
| Quotation Mark (Double Quote) | `\"` | `U+0022` |
|
| Quotation Mark (Double Quote) | `\"` | `U+0022` |
|
||||||
| Backspace | `\b` | `U+0008` |
|
| Backspace | `\b` | `U+0008` |
|
||||||
| Form Feed | `\f` | `U+000C` |
|
| Form Feed | `\f` | `U+000C` |
|
||||||
| Unicode Escape | `\u{(1-6 hex chars)}` | Code point described by hex characters, up to `10FFFF` |
|
| Unicode Escape | `\u{(1-6 hex chars)}` | Code point described by hex characters, up to `10FFFF` |
|
||||||
|
| Whitespace Escape | See below | N/A |
|
||||||
|
|
||||||
|
##### Escaped Whitespace
|
||||||
|
|
||||||
|
In addition to escaping individual characters, `\` can also escape whitespace.
|
||||||
|
When a `\` is followed by one or more literal whitespace characters, the `\`
|
||||||
|
and all of that whitespace are discarded. For example, `"Hello World"` and
|
||||||
|
`"Hello \ World"` are semantically identical. See [whitespace](#whitespace)
|
||||||
|
and [newlines](#newlines) for how whitespace is defined.
|
||||||
|
|
||||||
|
Note that only literal whitespace is escaped; *escaped* whitespace is retained.
|
||||||
|
For example, these strings are all semantically identical:
|
||||||
|
|
||||||
|
```kdl
|
||||||
|
"Hello\ \nWorld"
|
||||||
|
|
||||||
|
"Hello\n\
|
||||||
|
World"
|
||||||
|
|
||||||
|
"Hello\nWorld"
|
||||||
|
|
||||||
|
"Hello
|
||||||
|
World"
|
||||||
|
```
|
||||||
|
|
||||||
|
##### Invalid escapes
|
||||||
|
|
||||||
|
Except as described in the escapes table, above, `\` *MUST NOT* precede any
|
||||||
|
other characters in a string.
|
||||||
|
|
||||||
### Raw String
|
### Raw String
|
||||||
|
|
||||||
|
|
@ -415,6 +427,7 @@ space](https://www.unicode.org/Public/UCD/latest/ucd/PropList.txt):
|
||||||
| Name | Code Pt |
|
| Name | Code Pt |
|
||||||
|----------------------|---------|
|
|----------------------|---------|
|
||||||
| Character Tabulation | `U+0009` |
|
| Character Tabulation | `U+0009` |
|
||||||
|
| Line Tabulation | `U+000B` |
|
||||||
| Space | `U+0020` |
|
| Space | `U+0020` |
|
||||||
| No-Break Space | `U+00A0` |
|
| No-Break Space | `U+00A0` |
|
||||||
| Ogham Space Mark | `U+1680` |
|
| Ogham Space Mark | `U+1680` |
|
||||||
|
|
@ -477,7 +490,10 @@ node-children := '{' nodes '}'
|
||||||
node-terminator := single-line-comment | newline | ';' | eof
|
node-terminator := single-line-comment | newline | ';' | eof
|
||||||
|
|
||||||
identifier := string | bare-identifier
|
identifier := string | bare-identifier
|
||||||
bare-identifier := ((identifier-char - digit - sign) identifier-char* | sign ((identifier-char - digit) identifier-char*)?) - keyword
|
bare-identifier := (unambiguous-ident | numberish-ident | stringish-ident) - keyword
|
||||||
|
unambiguous-ident := (identifier-char - digit - sign - "r") identifier-char*
|
||||||
|
numberish-ident := sign ((identifier-char - digit) identifier-char*)?
|
||||||
|
stringish-ident := "r" ((identifier-char - "#") identifier-char*)?
|
||||||
identifier-char := unicode - line-space - [\/(){}<>;[]=,"]
|
identifier-char := unicode - line-space - [\/(){}<>;[]=,"]
|
||||||
keyword := boolean | 'null'
|
keyword := boolean | 'null'
|
||||||
prop := identifier '=' value
|
prop := identifier '=' value
|
||||||
|
|
@ -487,7 +503,7 @@ type := '(' identifier ')'
|
||||||
string := raw-string | escaped-string
|
string := raw-string | escaped-string
|
||||||
escaped-string := '"' character* '"'
|
escaped-string := '"' character* '"'
|
||||||
character := '\' escape | [^\"]
|
character := '\' escape | [^\"]
|
||||||
escape := ["\\/bfnrt] | 'u{' hex-digit{1, 6} '}'
|
escape := ["\\bfnrt] | 'u{' hex-digit{1, 6} '}' | (unicode-space | newline)+
|
||||||
hex-digit := [0-9a-fA-F]
|
hex-digit := [0-9a-fA-F]
|
||||||
|
|
||||||
raw-string := 'r' raw-string-hash
|
raw-string := 'r' raw-string-hash
|
||||||
|
|
@ -518,7 +534,7 @@ bom := '\u{FEFF}'
|
||||||
|
|
||||||
unicode-space := See Table (All White_Space unicode characters which are not `newline`)
|
unicode-space := See Table (All White_Space unicode characters which are not `newline`)
|
||||||
|
|
||||||
single-line-comment := '//' ^newline+ (newline | eof)
|
single-line-comment := '//' ^newline* (newline | eof)
|
||||||
multi-line-comment := '/*' commented-block
|
multi-line-comment := '/*' commented-block
|
||||||
commented-block := '*/' | (multi-line-comment | '*' | '/' | [^*/]+) commented-block
|
commented-block := '*/' | (multi-line-comment | '*' | '/' | [^*/]+) commented-block
|
||||||
```
|
```
|
||||||
|
|
|
||||||
|
|
@ -1 +1 @@
|
||||||
node "\"\\/\b\f\n\r\t"
|
node "\"\\\b\f\n\r\t"
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1 @@
|
||||||
|
node
|
||||||
|
|
@ -0,0 +1 @@
|
||||||
|
node "Hello\n\tWorld" "Hello\n\tWorld" "Hello\n\tWorld" "Hello\n\tWorld" "Hello\n\tWorld" "Hello\n\tWorld"
|
||||||
|
|
@ -1 +1 @@
|
||||||
node "\"\\\/\b\f\n\r\t"
|
node "\"\\\b\f\n\r\t"
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,2 @@
|
||||||
|
//
|
||||||
|
node
|
||||||
|
|
@ -0,0 +1,15 @@
|
||||||
|
// All of these strings are the same
|
||||||
|
node \
|
||||||
|
"Hello\n\tWorld" \
|
||||||
|
"Hello
|
||||||
|
World" \
|
||||||
|
"Hello\n\ \tWorld" \
|
||||||
|
"Hello\n\
|
||||||
|
\tWorld" \
|
||||||
|
"Hello
|
||||||
|
\ \tWorld" \
|
||||||
|
"Hello\n\t\
|
||||||
|
World"
|
||||||
|
|
||||||
|
// Note that this file deliberately mixes space and newline indentation for
|
||||||
|
// test purposes
|
||||||
|
|
@ -0,0 +1 @@
|
||||||
|
node "\/"
|
||||||
Loading…
Reference in New Issue