merge kdl-v2 branch into main

This commit is contained in:
Kat Marchán 2024-11-29 00:11:01 -08:00
commit 8b2a019998
No known key found for this signature in database
GPG Key ID: AEB529C08A3C7E9E
268 changed files with 1366 additions and 691 deletions

95
CHANGELOG.md Normal file
View File

@ -0,0 +1,95 @@
# KDL Changelog
## 2.0.0-draft.5 (2024-11-28)
* Equals signs other than `=` are no longer supported in properties.
* 128-bit integer type annotations have been added to the list of "well-known"
type annotations.
* Multiline string escape rules have been tweaked significantly.
* `\s` is now a valid escape within a string, representing a space character.
* Slashdash (`/-`)-compatible locations and related grammar adjusted to be more
clear and intuitive. This includes some changes relating to whitespace,
including comments and newlines, which are breaking changes.
* Various updates to test suite to reflect changes.
## 2.0.0 (Unreleased)
### Grammar
* Solidus/Forward slash (`/`) is no longer an escaped character.
* Space (`U+0020`) can now be written into quoted strings with the `\s`
escape.
* Single line comments (`//`) can now be immediately followed by a newline.
* All literal whitespace following a `\` in a string is now discarded.
* Vertical tabs (`U+000B`) are now considered to be whitespace.
* The grammar syntax itself has been described, and some confusing definitions
in the grammar have been fixed accordingly (mostly related to escaped
characters).
* `,`, `<`, and `>` are now legal identifier characters. They were previously
reserved for KQL but this is no longer necessary.
* Code points under `0x20` (except newline and whitespace code points), code
points above `0x10FFFF`, Delete control character (`0x7F`), and the [unicode
"direction control"
characters](https://www.w3.org/International/questions/qa-bidi-unicode-controls)
are now completely banned from appearing literally in KDL documents. They
can now only be represented in regular strings, and there's no facilities to
represent them in raw strings. This should be considered a security
improvement.
* Raw strings no longer require an `r` prefix: they are now specified by using
`#""#`.
* Line continuations can be followed by an EOF now, instead of requiring a
newline (or comment). `node \<EOF>` is now a legal KDL document.
* `#` is no longer a legal identifier character.
* `null`, `true`, and `false` are now `#null`, `#true`, and `#false`. Using
the unprefixed versions of these values is a syntax error.
* The spec prose has more explicitly stated that whitespace and newlines are
not valid identifier characters, even though the grammar already expressed
this.
* Bare identifiers can now be used as values in Arguments and Properties, and are interpreted as string values.
* The spec prose now more explicitly states that strings and raw strings can
be used as type annotations.
* Removed a statement in the spec prose that said "It is reasonable for an
implementation to ignore null values altogether when deserializing". This is
no longer encouraged or desired.
* Code points have been constrained to [Unicode Scalar
Values](https://unicode.org/glossary/#unicode_scalar_value) only, including
values used in string escapes (`\u{}`). All KDL documents and string values
should be valid UTF-8 now, as was intended.
* The last node in a child block no longer needs to be terminated with `;`,
even if the closing `}` is on the same line, so this is now a legal node:
`node {foo;bar;baz}`
* More places allow whitespace (node-spaces, specifically) now. With great
power comes great responsibility:
* Inside `(foo)` annotations (so, `( foo )` would be legal (`( f oo )` would
not be, since it has two identifiers))
* Between annotations and the thing they're annotating (`(blah) node (thing)
1 y= (who) 2`)
* Around `=` for props (`x = 1`)
* The BOM is now only allowed as the first character in a document. It was
previously treated as generic whitespace.
* Multi-line strings are now automatically dedented, according to the common
whitespace matching the whitespace prefix of the closing line. Multiline
strings and raw strings now must have a newline immediately following their
opening `"`, and a final newline plus whitespace preceding the closing `"`.
* `.1`, `+.1` etc are no longer valid identifiers, to prevent confusion and
conflicts with numbers.
* Multi-line strings' literal Newline sequences are now normalized to single
`LF`s.
* `#inf`, `#-inf`, and `#nan` have been added in order to properly support
IEEE floats for implementations that choose to represent their decimals that
way.
* Correspondingly, the identifiers `inf`, `-inf`, and `nan` are now syntax
errors.
* `u128` and `i128` have been added as well-known number type annotations.
* Slashdash (`/-`) -compatible locations adjusted to be more clear and intuitive.
### KQL
* There's now a _required_ descendant selector (`>>`), instead of using plain
spaces for that purpose.
* The "any sibling" selector is now `++` instead of `~`, for consistency with
the new descendant selector.
* Some parsing logic around the grammar has changed.
* Multi- and single-line comments are now supported, as well as line
continuations with `\`.
* Map operators have been removed entirely.

View File

@ -3,15 +3,15 @@ JSON-in-KDL (JiK)
This specification describes a canonical way to losslessly encode [JSON](https://json.org) in [KDL](https://kdl.dev). While this isn't a very useful thing to want to do on its own, it's occasionally useful when using a KDL toolchain while speaking with a JSON-consuming or -emitting service.
This is version 3.0.1 of JiK.
This is version 4.0.0 of JiK.
JSON-in-KDL (JiK from now on) is a kdl microsyntax consisting of named nodes that represent objects, arrays, or literal values.
----
JSON literals are, luckily, a subset of KDL's literals. There are two ways to write a JSON literal into JiK:
There are two ways to write a JSON literal into JiK:
* As a node with any nodename and a single argument, like `- true` (for the JSON `true`) or `foo 5` (for the JSON `5`).
* As a node with any nodename and a single argument, like `- #true` (for the JSON `true`) or `foo 5` (for the JSON `5`).
* When nested in arrays or objects, literals can usually be written as arguments (for array nodes) or properties (for object nodes). See below for details.
----
@ -25,7 +25,7 @@ Children can encode literals and/or nested arrays and objects. For example, the
```kdl
- {
- 1
- true false
- #true #false
- 3
}
```
@ -36,7 +36,7 @@ Arguments and children can be mixed, if desired. The preceding example could als
```kdl
- 1 {
- true false
- #true #false
- 3
}
```
@ -54,10 +54,11 @@ The `(array)` type annotation can be used on any other valid array node if desir
JSON objects are represented in JiK as a node with any nodename, with zero or more properties and/or zero or more children with any nodenames.
Properties can encode literals - for example, the JSON `{"foo": 1, "bar": true}` can be written in JiK as `- foo=1 bar=true`.
Properties can encode literals - for example, the JSON `{"foo": 1, "bar": true}` can be written in JiK as `- foo=1 bar=#true`.
Children can encode literals and/or nested arrays and objects,
using the nodename for the item's key.
For example, the JSON `{"foo": 1, "bar": [2, {"baz": 3}], "qux":4}` can be written in JiK as:
```kdl

View File

@ -5,20 +5,20 @@ documents to extract nodes and even specific data. It is loosely based on CSS
selectors for familiarity and ease of use. Think of it as CSS Selectors or
XPath, but for KDL!
This document describes KQL `1.0.0`. It was released on September 11, 2021.
This document describes KQL `next`. It is unreleased.
## Selectors
Selectors use selection operators to filter nodes that will be returned by an
API using KQL. The main differences between this and CSS selectors are the
lack of `*` (use `[]` instead), and the specific syntax for
lack of `*` (use `[]` instead), the specific syntax for descendants and siblings, and the specific syntax for
[matchers](#matchers) (the stuff between `[` and `]`), which is similar, but not identical to CSS.
* `a > b`: Selects any `b` element that is a direct child of an `a` element.
* `a b`: Selects any `b` element that is a _descendant_ of an `a` element.
* `a b || a c`: Selects all `b` and `c` elements that are descendants of an `a` element. Any selector may be on either side of the `||`. Multiple `||` are supported.
* `a >> b`: Selects any `b` element that is a _descendant_ of an `a` element.
* `a >> b || a >> c`: Selects all `b` and `c` elements that are descendants of an `a` element. Any selector may be on either side of the `||`. Multiple `||` are supported.
* `a + b`: Selects any `b` element that is placed immediately after a sibling `a` element.
* `a ~ b`: Selects any `b` element that follows an `a` element as a sibling, either immediately or later.
* `a ++ b`: Selects any `b` element that follows an `a` element as a sibling, either immediately or later.
* `[accessor()]`: Selects any element, filtered by [an accessor](#accessors). (`accessor()` is a placeholder, not an actual accessor)
* `a[accessor()]`: Selects any `a` element, filtered by an accessor.
* `[]`: Selects any element.
@ -30,6 +30,11 @@ properties, node names, etc). With the exception of `top()` and `()`, they are a
used inside a `[]` selector. Some matchers are unary, but most of them involve
binary operators.
The `top()` matcher can only be used as the first matcher of a selector. This means
that it cannot be the right operand of the `>`, `>>`, `+`, or `++` operators. As `||`
combines selectors, the `top()` can appear just after it. For instance,
`a > b || top() > b` is valid, but `a > top()` is not.
* `top()`: Returns all toplevel children of the current document.
* `top() > []`: Equivalent to `top()` on its own.
* `(foo)`: Selects any element whose type annotation is `foo`.
@ -44,8 +49,8 @@ Attribute matchers support certain binary operators:
* `[val() = 1]`: Selects any element whose first value is 1.
* `[prop(name) = 1]`: Selects any element with a property `name` whose value is 1.
* `[name = 1]`: Equivalent to the above.
* `[name() = "hi"]`: Selects any element whose _node name_ is `"hi"`. Equivalent to just `hi`, but more useful when using string operators.
* `[tag() = "hi"]`: Selects any element whose type annotation is `"hi"`. Equivalent to just `(hi)`, but more useful when using string operators.
* `[name() = hi]`: Selects any element whose _node name_ is "hi". Equivalent to just `hi`, but more useful when using string operators.
* `[tag() = hi]`: Selects any element whose tag is "hi". Equivalent to just `(hi)`, but more useful when using string operators.
* `[val() != 1]`: Selects any element whose first value exists, and is not 1.
The following operators work with any `val()` or `prop()` values.
@ -60,64 +65,37 @@ never coerced to 1, and there is no "universal" ordering across all types.):
The following operators work only with string `val()`, `prop()`, `tag()`, or `name()` values.
If the value is not a string, the matcher will always fail:
* `[val() ^= "foo"]`: Selects any element whose first value starts with "foo".
* `[val() $= "foo"]`: Selects any element whose first value ends with "foo".
* `[val() *= "foo"]`: Selects any element whose first value contains "foo".
* `[val() ^= foo]`: Selects any element whose first value starts with "foo".
* `[val() $= foo]`: Selects any element whose first value ends with "foo".
* `[val() *= foo]`: Selects any element whose first value contains "foo".
The following operators work only with `val()` or `prop()` values. If the value
is not one of those, the matcher will always fail:
* `[val() = (foo)]`: Selects any element whose type annotation is `foo`.
## Map Operator
KQL implementations MAY support a "map operator", `=>`, that allows selection
of specific parts of the selected notes, essentially "mapping" over a
selector's result set.
Only a single map operator may be used, and it must be the last element in a
selector string.
The map operator's right hand side is either an [`accessor`](#accessors) on
its own, or a tuple of accessors, denoted by a comma-separated list wrapped in
`()` (for example, `(a, b, c)`).
## Accessors
Accessors access/extract specific parts of a node. They are used with the [map
operator](#map-operator), and have syntactic overlap with some
[matchers](#matchers).
* `name()`: Returns the name of the node itself.
* `val(2)`: Returns the third value in a node.
* `val()`: Equivalent to `val(0)`.
* `prop(foo)`: Returns the value of the property `foo` in the node.
* `foo`: Equivalent to `prop(foo)`.
* `props()`: Returns all properties of the node as an object.
* `values()`: Returns all values of the node as an array.
## Examples
Given this document:
```kdl
package {
name "foo"
name foo
version "1.0.0"
dependencies platform="windows" {
dependencies platform=windows {
winapi "1.0.0" path="./crates/my-winapi-fork"
}
dependencies {
miette "2.0.0" dev=true
miette "2.0.0" dev=#true integrity=(sri)sha512-deadbeef
}
}
```
Then the following queries are valid:
* `package name`
* `package >> name`
* -> fetches the `name` node itself
* `top() > package name`
* `top() > package >> name`
* -> fetches the `name` node, guaranteeing that `package` is in the document root.
* `dependencies`
* -> deep-fetches both `dependencies` nodes
@ -129,14 +107,25 @@ Then the following queries are valid:
* -> fetches all direct-child nodes of any `dependencies` nodes in the
document. In this case, it will fetch both `miette` and `winapi` nodes.
If using an API that supports the [map operator](#map-operator), the following
are valid queries:
## Full Grammar
* `package name => val()`
* -> `["foo"]`.
* `dependencies[platform] => platform`
* -> `["windows"]`
* `dependencies > [] => (name(), val(), path)`
* -> `[("winapi", "1.0.0", "./crates/my-winapi-fork"), ("miette", "2.0.0", None)]`
* `dependencies > [] => (name(), values(), props())`
* -> `[("winapi", ["1.0.0"], {"platform": "windows"}), ("miette", ["2.0.0"], {"dev": true})]`
Rules that are not defined in this grammar are prefixed with `$`, see [the KDL
grammar](https://github.com/kdl-org/kdl/blob/main/SPEC.md#full-grammar) for
what they expand to.
```
query-str := $bom? query
query := selector q-ws* "||" q-ws* query | selector
selector := filter q-ws* selector-operator q-ws* selector-subsequent | filter
selector-subsequent := matchers q-ws* selector-operator q-ws* selector-subsequent | matchers
selector-operator := ">>" | ">" | "++" | "+"
filter := "top(" q-ws* ")" | matchers
matchers := type-matcher $string? accessor-matcher* | $string accessor-matcher* | accessor-matcher+
type-matcher := "(" q-ws* ")" | $type
accessor-matcher := "[" q-ws* (comparison | accessor)? q-ws* "]"
comparison := accessor q-ws* matcher-operator q-ws* ($type | $string | $number | $keyword)
accessor := "val(" q-ws* $integer q-ws* ")" | "prop(" q-ws* $string q-ws* ")" | "name(" q-ws* ")" | "tag(" q-ws* ")" | "values(" q-ws* ")" | "props(" q-ws* ")" | $string
matcher-operator := "=" | "!=" | ">" | "<" | ">=" | "<=" | "^=" | "$=" | "*="
q-ws := $plain-node-space
```

176
README.md
View File

@ -1,28 +1,35 @@
# The KDL Document Language
KDL is a small, pleasing document language with xml-like semantics that looks
like you're invoking a bunch of CLI commands! It's meant to be used both as a
serialization format and a configuration language, much like JSON, YAML, or
XML. It looks like this:
> [!WARNING]
> The main branch of this repository shows the latest v2.0.0 draft, which is a
> work in progress and not considered the "mainline" KDL yet. Most KDL
> implementations in the wild are based on the [v1.0.0
> spec](https://github.com/kdl-org/kdl/tree/1.0.0) instead, so you may want to
> refer to that if you're using KDL today.
KDL is a small, pleasant document language with XML-like node semantics that
looks like you're invoking a bunch of CLI commands! It's meant to be used both
as a serialization format and a configuration language, much like JSON, YAML,
or XML. It looks like this:
```kdl
package {
name "my-pkg"
name my-pkg
version "1.2.3"
dependencies {
// Nodes can have standalone values as well as
// key/value pairs.
lodash "^3.2.1" optional=true alias="underscore"
lodash "^3.2.1" optional=#true alias=underscore
}
scripts {
// "Raw" and multi-line strings are supported.
build r#"
// "Raw" and dedented multi-line strings are supported.
build #"
echo "foo"
node -c "console.log('hello, world!');"
echo "foo" > some-file.txt
"#
"#
}
// `\` breaks up a single node across multiple lines.
@ -33,8 +40,8 @@ package {
// "Slashdash" comments operate at the node level,
// with just `/-`.
/-this-is-commented {
this "entire" "node" {
"is" "gone"
this entire node {
is gone
}
}
}
@ -44,22 +51,23 @@ There's a living [specification](SPEC.md), as well as various
[implementations](#implementations). You can also check out the [FAQ](#faq) to
answer all your burning questions!
The current version of the KDL spec is `2.0.0-draft.5`.
In addition to a spec for KDL itself, there are also standard specs for [a KDL
Query Language](QUERY-SPEC.md) based on CSS selectors, and [a KDL Schema
Language](SCHEMA-SPEC.md) loosely based on JSON Schema.
The language is based on [SDLang](https://sdlang.org), with a number of
modifications and clarifications on its syntax and behavior.
The current version of the KDL spec is `1.0.0`.
The language is based on [SDLang](https://sdlang.org), with a [number of
modifications and clarifications on its syntax and behavior](#why-not-sdlang).
[Play with it in your browser!](https://kdl-play.danini.dev/)
## Design and Discussion
KDL is still extremely new, and discussion about the format should happen over
on the [discussions page](https://github.com/kdl-org/kdl/discussions). Feel
free to jump in and give us your 2 cents!
KDL 2.0 design is still in progress. Discussions and questions about the format
should happen over on the [discussions
page](https://github.com/kdl-org/kdl/discussions). Feel free to jump in and give
us your 2 cents!
## Implementations
@ -104,7 +112,7 @@ entirety, but in the future, may be required to in order to be included here.
### Basics
A KDL node is a node name, followed by zero or more "arguments", and
A KDL node is a node name string, followed by zero or more "arguments", and
children.
```kdl
@ -117,10 +125,10 @@ You can also have multiple values in a single node!
bookmarks 12 15 188 1234
```
Nodes can have properties.
Nodes can have properties, with string keys.
```kdl
author "Alex Monad" email="alex@example.com" active=true
author "Alex Monad" email=alex@example.com active=#true
```
And they can have nested child nodes, too!
@ -145,36 +153,66 @@ node1; node2; node3;
KDL supports 4 data types:
* Strings: `"hello world"`
* Strings: `unquoted`, `"hello world"`, or `#"hello world"#`
* Numbers: `123.45`
* Booleans: `true` and `false`
* Null: `null`
* Booleans: `#true` and `#false`
* Null: `#null`
#### Strings
It supports two different formats for string input: escaped and raw.
It supports three different formats for string input: identifiers, quoted, and raw.
```kdl
node "this\nhas\tescapes"
other r"C:\Users\zkat\"
```
Both types of string can be multiline as-is, without a different syntax:
```kdl
string "my
multiline
value"
node1 this-is-a-string
node2 "this\nhas\tescapes"
node3 #"C:\Users\zkat\raw\string"#
```
And for raw strings, you can add any number of # after the r and the last " to
disambiguate literal " characters:
You don't have to quote strings unless any the following apply:
* The string contains whitespace.
* The string contains any of `[]{}()\/#";=`.
* The string is one of `true`, `false`, `null`, `inf`, `-inf`, or `nan`.
* The strings starts with a digit, or `+`/`-`/`.`/`-.`,`+.` and a digit.
(aka "looks like a number")
In essence, if it can get confused for other KDL or KQL syntax, it needs
quotes.
Both types of quoted string can be multiline as-is, without a different
syntax. Additionally, common indentation shared with the line containing the
closing quote will be stripped/dedented:
```kdl
other-raw r#"hello"world"#
string "
my
multiline
value
"
```
Raw strings, which do not support `\` escapes and can be used when you want
certain kinds of strings to look nicer without having to escape a lot:
```kdl
exec #"
echo "foo"
echo "bar"
cd C:\path\to\dir
"#
regex #"\d{3} "[^/"]+""#
```
You can add any number of `#`s before and after the opening and
closing `#` to disambiguate literal closing `#"` sequences:
```kdl
other-raw ##"hello#"world"##
```
#### Numbers
There's 4 ways to represent numbers in KDL. KDL does not prescribe any
There are 4 ways to represent numbers in KDL. KDL does not prescribe any
representation for these numbers, and it's entirely up to individual
implementations whether to represent all numbers with a single type, or to
have different representations for different forms.
@ -213,7 +251,7 @@ comments can be nested.
C style multiline
*/
tag /*foo=true*/ bar=false
tag /*foo=#true*/ bar=#false
/*/*
hello
@ -221,20 +259,22 @@ hello
```
On top of that, KDL supports `/-` "slashdash" comments, which can be used to
comment out individual nodes, arguments, or children:
comment out individual nodes, arguments, or child blocks:
```kdl
// This entire node and its children are all commented out.
/-mynode "foo" key=1 {
/-mynode foo key=1 {
a
b
c
}
mynode /-"commented" "not commented" /-key="value" /-{
mynode /-commented "not commented" /-key=value /-{
a
b
}
// The above is equivalent to:
mynode "not commented"
```
### Type Annotations
@ -246,8 +286,8 @@ specific meanings.
```kdl
numbers (u8)10 (i32)20 myfloat=(f32)1.5 {
strings (uuid)"123e4567-e89b-12d3-a456-426614174000" (date)"2021-02-03" filter=(regex)r"$\d+"
(author)person name="Alex"
strings (uuid)"123e4567-e89b-12d3-a456-426614174000" (date)"2021-02-03" filter=(regex)#"$\d+"#
(author)person name=Alex
}
```
@ -260,21 +300,21 @@ title \
// Files must be utf8 encoded!
smile "😁"
smile 😁
// Instead of anonymous nodes, nodes and properties can be wrapped
// in "" for arbitrary node names.
"!@#$@$%Q#$%~@!40" "1.2.3" "!!!!!"=true
// Node names and property keys are just strings, so you can write them like
// quoted or raw strings, too!
"illegal{}[]/\\=#;identifier" #"1.2.3"# "#false"=#true
// The following is a legal bare identifier:
foo123~!@#$%^&*.:'|?+ "weeee"
// Identifiers are very flexible. The following is a legal bare identifier:
<@foo123~!$%^&*.:'|?+>
// And you can also use unicode!
ノード お名前="☜(゚ヮ゚☜)"
ノード お名前=ฅ^•ﻌ•^ฅ
// kdl specifically allows properties and values to be
// interspersed with each other, much like CLI commands.
foo bar=true "baz" quux=false 1 2 3
foo bar=#true baz quux=#false 1 2 3
```
## Design Principles
@ -306,25 +346,31 @@ Same as "cuddle".
Because nothing out there felt quite right. The closest one I found was
SDLang, but that had some design choices I disagreed with.
<a name="why-not-sdlang"></a>
#### Ok, then, why not SDLang?
SDLang is designed for use cases that are not interesting to me, but are very
relevant to the D-lang community. KDL is very similar in many ways, but is
different in the following ways:
SDLang is an excellent base, but I wanted some details ironed out, and some
things removed that only really made sense for SDLang's current use-cases, including
some restrictions about data representation. KDL is very similar in many ways, except:
* The grammar and expected semantics are [well-defined and specified](SPEC.md).
* There is only one "number" type. KDL does not prescribe representations.
* There is only one "number" type. KDL does not prescribe representations, but
does have keywords for NaN, infinity, and negative infinity if decimal numbers
are intended to be represtented as IEEE754 floats.
* Slashdash (`/-`) comments are great and useful!
* I am not interested in having first-class date types, and SDLang's are very
non-standard.
* Quoteless "identifier" strings are supported. (e.g. `node foo=bar`, vs `node foo="bar"`)
* KDL does not have first-class date or binary data types. Instead, it
supports arbitrary type annotations for any custom data type you might need:
`(date)"2021-02-03"`, `(binary)"deadbeefbadc0ffee"`.
* Values and properties can be interspersed with each other, rather than one
having to follow the other.
* KDL does not have a first-class binary data type. Just use strings with base64.
* All strings in KDL are multi-line, and raw strings are written with
Rust-style syntax (`r"foo"`), instead of backticks.
* KDL identifiers can use UTF-8 and are much more lax about symbols than SDLang.
* All strings in KDL are multi-line, and multi-line strings are automatically dedented to match their closing quote's indentation level.
* Raw strings are written with `#` (`#"foo\bar"#`), instead of backticks.
* KDL identifiers can use UTF-8 and are more lax about symbols than SDLang.
* KDL does not support "anonymous" nodes.
* Instead, KDL supports arbitrary identifiers for node names and attribute
* Namespaces are not supported, but `:` is a legal identifier character, and applications
can choose to implement namespaces as they see fit.
* KDL supports arbitrary identifiers for node names and attribute
names, meaning you can use arbitrary strings for those: `"123" "value"=1` is
a valid node, for example. This makes it easier to use KDL for
representing arbitrary key/value pairs.
@ -401,3 +447,7 @@ microsyntax for losslessly encoding XML](XML-IN-KDL.md).
This license applies to the text and assets _in this repository_.
Implementations of this specification are not "derivative works", and thus are
not bound by the restrictions of CC-BY-SA.
The KDL logo design and files were generously contributed by Timothy Merritt
([@timmybytes](https://github.com/timmybytes)), and are also available under
the same license.

View File

@ -34,10 +34,10 @@ None.
* [`node`](#node-node) - zero or more toplevel nodes for the KDL document this schema describes.
* [`definitions`](#definitions-node) (optional): Definitions of nodes, values, props, and children block to reference in the toplevel nodes.
* `node-names` (optional): [Validations](#validation-nodes) to apply to the _names_ of child nodes.
* `other-nodes-allowed` (optional): Whether to allow nodes other than the ones explicitly listed here. Defaults to `false`.
* `other-nodes-allowed` (optional): Whether to allow nodes other than the ones explicitly listed here. Defaults to `#false`.
* [`tag`](#tag-node) - zero or more toplevel tags for nodes in the KDL document that this schema describes.
* `tag-names` (optional): [Validations](#validation-nodes) to apply to the _names_ of tags of child nodes.
* `other-tags-allowed` (optional): Whether to allow node tags other than the ones explicitly listed here. Defaults to `false`.
* `other-tags-allowed` (optional): Whether to allow node tags other than the ones explicitly listed here. Defaults to `#false`.
### `info` node
@ -113,7 +113,7 @@ Links to the schema itself, and to sources about the schema.
#### Properties
* `rel`: what the link is for (`"self"` or `"documentation"`)
* `rel`: what the link is for (`self` or `documentation`)
* `lang` (optional): An IETF BCP 47 language tag
### `license` node

602
SPEC.md
View File

@ -3,7 +3,8 @@
This is the semi-formal specification for KDL, including the intended data
model and the grammar.
This document describes KDL version `1.0.0`. It was released on September 11, 2021.
This document describes KDL version `2.0.0-draft.5`. It was released on
2024-11-28.
## Introduction
@ -49,8 +50,8 @@ baz
### Node
Being a node-oriented language means that the real core component of any KDL
document is the "node". Every node must have a name, which is an
[Identifier](#identifier).
document is the "node". Every node must have a name, which must be a
[String](#string).
The name may be preceded by a [Type Annotation](#type-annotation) to further
clarify its type, particularly in relation to its parent node. (For example,
@ -74,9 +75,9 @@ By contrast, Property order _SHOULD NOT_ matter to implementations.
[Children](#children-block) should be used if an order-sensitive key/value
data structure must be represented in KDL.
Nodes _MAY_ be prefixed with `/-` to "comment out" the entire node, including
its properties, arguments, and children, and make it act as plain whitespace,
even if it spreads across multiple lines.
Nodes _MAY_ be prefixed with [Slashdash](#slashdash-comments) to "comment out"
the entire node, including its properties, arguments, and children, and make
it act as plain whitespace, even if it spreads across multiple lines.
Finally, a node is terminated by either a [Newline](#newline), a semicolon (`;`)
or the end of the file/stream (an `EOF`).
@ -84,62 +85,20 @@ or the end of the file/stream (an `EOF`).
#### Example
```kdl
foo 1 key="val" 3 {
foo 1 key=val 3 {
bar
(role)baz 1 2
}
```
### Identifier
An Identifier is either a [Bare Identifier](#bare-identifier), which is an
unquoted string like `node` or `item`, or a [String](#string), which is quoted,
like `"node"` or `"two words"`. There's no semantic difference between the
kinds of identifier; this simply allows for the use of quotes to have unusual
identifiers that are inexpressible as bare identifiers.
### Bare Identifier
A Bare Identifier is composed of any Unicode codepoint other than [non-initial
characters](#non-initial-characters), followed by any number of Unicode
codepoints other than [non-identifier characters](#non-identifier-characters),
so long as this doesn't produce something confusable for a [Number](#number),
[Boolean](#boolean), or [Null](#null). For example, both a [Number](#number)
and an Identifier can start with `-`, but when an Identifier starts with `-`
the second character cannot be a digit. This is precisely specified in the
[Full Grammar](#full-grammar) below.
Identifiers are terminated by [Whitespace](#whitespace) or
[Newlines](#newline).
### Non-initial characters
The following characters cannot be the first character in a
[Bare Identifier](#identifier):
* Any decimal digit (0-9)
* Any [non-identifier characters](#non-identifier-characters)
Be aware that the `-` character can only be used as an initial
character if the second character is not a digit. This allows
identifiers to look like `--this`, and removes the ambiguity
of having an identifier look like a negative number.
### Non-identifier characters
The following characters cannot be used anywhere in a [Bare Identifier](#identifier):
* Any codepoint with hexadecimal value `0x20` or below.
* Any codepoint with hexadecimal value higher than `0x10FFFF`.
* Any of `\/(){}<>;[]=,"`
### Line Continuation
Line continuations allow [Nodes](#node) to be spread across multiple lines.
A line continuation is a `\` character followed by zero or more whitespace
characters and an optional single-line comment. It must be terminated by a
[Newline](#newline) (including the Newline that is part of single-line comments).
items (including multiline comments) and an optional single-line comment. It
must be terminated by a [Newline](#newline) (including the Newline that is
part of single-line comments).
Following a line continuation, processing of a Node can continue as usual.
@ -153,7 +112,8 @@ my-node 1 2 \ // comments are ok after \
### Property
A Property is a key/value pair attached to a [Node](#node). A Property is
composed of an [Identifier](#identifier), followed immediately by a `=`, and then a [Value](#value).
composed of a [String](#string), followed immediately by an equals sign (`=`, `U+003D`),
and then a [Value](#value).
Properties should be interpreted left-to-right, with rightmost properties with
identical names overriding earlier properties. That is:
@ -186,7 +146,7 @@ make it act as plain whitespace, even if it spreads across multiple lines.
#### Example
```kdl
my-node 1 2 3 "a" "b" "c"
my-node 1 2 3 a b c
```
### Children Block
@ -215,7 +175,8 @@ A value is either: a [String](#string), a [Number](#number), a
[Boolean](#boolean), or [Null](#null).
Values _MUST_ be either [Arguments](#argument) or values of
[Properties](#property).
[Properties](#property). Only [String](#string) values may be used as
[Node](#node) names or [Property](#property) keys.
Values (both as arguments and as properties) _MAY_ be prefixed by a single
[Type Annotation](#type-annotation).
@ -227,10 +188,9 @@ includes a _suggestion_ of what type the value is _intended_ to be treated as,
or as a _context-specific elaboration_ of the more generic type the node name
indicates.
Type annotations are written as a set of `(` and `)` with an
[Identifier](#identifier) in it. Any valid identifier is considered a valid
type annotation. There must be no whitespace between a type annotation and its
associated Node Name or Value.
Type annotations are written as a set of `(` and `)` with a single
[String](#string) in it. It may contain Whitespace after the `(` and before
the `)`, and may be separated from its target by Whitespace.
KDL does not specify any restrictions on what implementations might do with
these annotations. They are free to ignore them, or use them to make decisions
@ -247,6 +207,7 @@ Signed integers of various sizes (the number is the bit size):
* `i16`
* `i32`
* `i64`
* `i128`
Unsigned integers of various sizes (the number is the bit size):
@ -254,6 +215,7 @@ Unsigned integers of various sizes (the number is the bit size):
* `u16`
* `u32`
* `u64`
* `u128`
Platform-dependent integer types, both signed and unsigned:
@ -302,29 +264,97 @@ IEEE 754-2008 decimal floating point numbers
```kdl
node (u8)123
node prop=(regex)".*"
node prop=(regex).*
(published)date "1970-01-01"
(contributor)person name="Foo McBar"
```
### String
Strings in KDL represent textual [Values](#value), or unusual identifiers. A
String is either a [Quoted String](#quoted-string) or a
[Raw String](#raw-string). Quoted Strings may include escaped characters, while
Raw Strings always contain only the literal characters that are present.
Strings in KDL represent textual UTF-8 [Values](#value). A String is either an
[Identifier String](#identifier-string) (like `foo`), a [Quoted
String](#quoted-string) (like `"foo"`) or a [Raw String](#raw-string) (like
`#"foo"#`):
* Identifier Strings let you write short, "single-word" strings with a
minimum of syntax
* Quoted Strings let you write strings with whitespace
(including newlines!) or escapes
* Raw Strings let you write strings with whitespace *but without escapes*,
allowing you to not worry about the string's content containing anything that
might look like an escape.
Strings _MUST_ be represented as UTF-8 values.
Strings _MUST NOT_ include the code points for [disallowed literal code
points](#disallowed-literal-code-points) directly. Quoted Strings may include
these code points as _values_ by representing them with their corresponding
`\u{...}` escape.
### Identifier String
An Identifier String (sometimes referred to as just an "identifier") is
composed of any [Unicode Scalar
Value](https://unicode.org/glossary/#unicode_scalar_value) other than
[non-initial characters](#non-initial-characters), followed by any number of
Unicode Scalar Values other than [non-identifier
characters](#non-identifier-characters).
A handful of patterns are disallowed, to avoid confusion with other values:
* idents that appear to start with a [Number](#number) (like `1.0v2` or
`-1em`) or the "almost a number" pattern of a decimal point without a
leading digit (like `.1`).
* idents that are the language keywords (`inf`, `-inf`, `nan`, `true`,
`false`, and `null`) without their leading `#`.
Identifiers that match these patterns _MUST_ be treated as a syntax error; such
values can only be written as quoted or raw strings. The precise details of the
identifier syntax is specified in the [Full Grammar](#full-grammar) below.
Identifier Strings are terminated by [Whitespace](#whitespace) or
[Newlines](#newline).
#### Non-initial characters
The following characters cannot be the first character in an
[Identifier String](#identifier-string):
* Any decimal digit (0-9)
* Any [non-identifier characters](#non-identifier-characters)
Additionally, the `-` character can only be used as an initial character if
the second character is *not* a digit. This allows identifiers to look like
`--this`, and removes the ambiguity of having an identifier look like a
negative number.
#### Non-identifier characters
The following characters cannot be used anywhere in a [Identifier String](#identifier-string):
* Any of `(){}[]/\"#;=`
* Any [Whitespace](#whitespace) or [Newline](#newline).
* Any [disallowed literal code points](#disallowed-literal-code-points) in KDL
documents.
### Quoted String
A Quoted String is delimited by `"` on either side of any number of literal
string characters except unescaped `"` and `\`. This includes literal
[Newline](#newline) characters, which means a String Value can encompass
multiple lines without behaving like a Newline for [Node](#node) parsing
purposes.
[Newline](#newline) characters, which means a single String Value can span
multiple lines, following specific [Multi-line String](#multi-line-strings)
rules.
Strings _MUST_ be represented as UTF-8 values.
Like Identifier Strings, Quoted Strings _MUST NOT_ include any of the
[disallowed literal code-points](#disallowed-literal-code-points) as code
points in their body.
In addition to literal code points, a number of "escapes" are supported.
Quoted Strings also follow the Multi-line rules specified in [Multi-line
String](#multi-line-strings).
#### Escapes
In addition to literal code points, a number of "escapes" are supported in Quoted Strings.
"Escapes" are the character `\` followed by another character, and are
interpreted as described in the following table:
@ -334,32 +364,237 @@ interpreted as described in the following table:
| Carriage Return | `\r` | `U+000D` |
| Character Tabulation (Tab) | `\t` | `U+0009` |
| Reverse Solidus (Backslash) | `\\` | `U+005C` |
| Solidus (Forwardslash) | `\/` | `U+002F` |
| Quotation Mark (Double Quote) | `\"` | `U+0022` |
| Backspace | `\b` | `U+0008` |
| Form Feed | `\f` | `U+000C` |
| Unicode Escape | `\u{(1-6 hex chars)}` | Code point described by hex characters, up to `10FFFF` |
| Space | `\s` | `U+0020` |
| Unicode Escape | `\u{(1-6 hex chars)}` | Code point described by hex characters, as long as it represents a [Unicode Scalar Value](https://unicode.org/glossary/#unicode_scalar_value) |
| Whitespace Escape | See below | N/A |
##### Escaped Whitespace
In addition to escaping individual characters, `\` can also escape whitespace.
When a `\` is followed by one or more literal whitespace characters, the `\`
and all of that whitespace are discarded. For example, `"Hello World"` and
`"Hello \ World"` are semantically identical. See [whitespace](#whitespace)
and [newlines](#newline) for how whitespace is defined.
Note that only literal whitespace is escaped; whitespace escapes (`\n` and
such) are retained. For example, these strings are all semantically identical:
```kdl
"Hello\ \nWorld"
"Hello\n\
World"
"Hello\nWorld"
"
Hello
World
"
```
##### Invalid escapes
Except as described in the escapes table, above, `\` *MUST NOT* precede any
other characters in a string.
### Raw String
Raw Strings in KDL are much like [Quoted Strings](#quoted-string), except they
do not support `\`-escapes. They otherwise share the same properties as far as
literal [Newline](#newline) characters go, and the requirement of UTF-8
representation.
literal [Newline](#newline) characters go, multi-line rules, and the requirement
of UTF-8 representation.
Raw String literals are represented as `r`, followed by zero or more `#`
characters, followed by `"`, followed by any number of UTF-8 literals. The
string is then closed by a `"` followed by a _matching_ number of `#`
characters. This allows them to contain raw `"` or `#` characters; only the
precise terminator (resembling `"##`, for example) ends the raw string. This
means that the string sequence `"` or `"#` and such must not match the closing
`"` with the same or more `#` characters as the opening `r`.
Raw String literals are represented with one or more `#` characters, followed
by `"`, followed by any number of UTF-8 literals. The string is then closed by
a `"` followed by a _matching_ number of `#` characters. This means that the
string sequence `"` or `"#` and such must not match the closing `"` with the
same or more `#` characters as the opening `#`, in the body of the string.
Like other Strings, Raw Strings _MUST NOT_ include any of the [disallowed
literal code-points](#disallowed-literal-code-points) as code points in their
body. Unlike with Quoted Strings, these cannot simply be escaped, and are thus
unrepresentable when using Raw Strings.
#### Example
```kdl
just-escapes r"\n will be literal"
quotes-and-escapes r#"hello\n\r\asd"world"#
just-escapes #"\n will be literal"#
```
The string contains the literal characters `\n will be literal`.
```kdl
quotes-and-escapes ##"hello\n\r\asd"#world"##
```
The string contains the literal characters `hello\n\r\asd"#world`
### Multi-line Strings
When a Quoted or Raw String spans multiple lines with literal, non-escaped
Newlines, it follows a special multi-line syntax that automatically "dedents"
the string, allowing its value to be indented to a visually matching level if
desired.
A Multi-line string _MUST_ start with a [Newline](#newline) immediately
following its opening `"`. Its final line _MUST_ contain only whitespace,
followed by a single closing `"`. All in-between lines that contain
non-newline characters _MUST_ start with _at least_ the exact same whitespace
as the final line (precisely matching codepoints, not merely counting characters).
They may contain additional whitespace following this prefix.
The value of the Multi-line String omits the first and last Newline, the
Whitespace of the last line, and the matching Whitespace prefix on all
intermediate lines. The first and last Newline can be the same character (that
is, empty multi-line strings are legal).
Strings with literal Newlines that do not immediately start with a Newline and
whose final `"` is not preceeded by optional whitespace and a Newline are
illegal.
In other words, the final line specifies the whitespace prefix that will be
removed from all other lines.
It is a syntax error for any body lines of the multi-line string to not match
the whitespace prefix of the last line with the final quote.
#### Newline Normalization
Literal Newline sequences in Multi-line Strings must be normalized to a single
`U+000A` (`LF`) during deserialization. This means, for example, that `CR LF`
becomes a single `LF` during parsing.
This normalization does not apply to non-literal Newlines entered using escape
sequences.
For clarity: this normalization is for individual sequences. That is, the
literal sequence `CRLF CRLF` becomes `LF LF`, not `LF`.
#### Example
```kdl
multi-line "
foo
This is the base indentation
bar
"
```
This example's string value will be:
```
foo
This is the base indentation
bar
```
which is equivalent to `" foo\nThis is the base indentation\n bar"`
when written as a single-line string.
---------
If the last line wasn't indented as far,
it won't dedent the rest of the lines as much:
```kdl
multi-line "
foo
This is no longer on the left edge
bar
"
```
This example's string value will be:
```
foo
This is no longer on the left edge
bar
```
Equivalent to `" foo\n This is no longer on the left edge\n bar"`.
-----------
Empty lines can contain any whitespace, or none at all, and will be reflected as empty in the value:
```kdl
multi-line "
Indented a bit
A second indented paragraph.
"
```
This example's string value will be:
```
Indented a bit.
A second indented paragraph.
```
Equivalent to `"Indented a bit.\n\nA second indented paragraph."`
-----------
The following yield syntax errors:
```kdl
multi-line "
closing quote with non-whitespace prefix"
```
```kdl
multi-line "stuff
"
```
```kdl
// Every line must share the exact same prefix as the closing line.
multi-line "[\n]
[tab]a[\n]
[space][space]b[\n]
[space][tab][\n]
[tab]"
```
#### Interaction with Whitespace Escapes
Multi-line strings support the same mechanism for escaping whitespace. When
processing a Multi-line String, implementations MUST dedent the string _after_
resolving all whitespace escapes, but _before_ resolving other backslash escapes.
Furthermore, a whitespace escape that attempts to escape the final line's newline
and/or whitespace prefix is invalid since the multi-line string has to still be
valid with the escaped whitespace removed.
For example, the following example is illegal:
```kdl
// Equivalent to trying to write a string containing `foo\nbar\`.
"
foo
bar\
"
```
while the following example is allowed
```kdl
"
foo \
bar
baz
\ "
// this is equivalent to
"
foo bar
baz
"
```
### Number
@ -368,9 +603,9 @@ Numbers in KDL represent numerical [Values](#value). There is no logical distinc
between real numbers, integers, and floating point numbers. It's up to
individual implementations to determine how to represent KDL numbers.
There are four syntaxes for Numbers: Decimal, Hexadecimal, Octal, and Binary.
There are five syntaxes for Numbers: Keywords, Decimal, Hexadecimal, Octal, and Binary.
* All numbers may optionally start with one of `-` or `+`, which determine whether they'll be positive or negative.
* All non-[Keyword](#keyword-numbers) numbers may optionally start with one of `-` or `+`, which determine whether they'll be positive or negative.
* Binary numbers start with `0b` and only allow `0` and `1` as digits, which may be separated by `_`. They represent numbers in radix 2.
* Octal numbers start with `0o` and only allow digits between `0` and `7`, which may be separated by `_`. They represent numbers in radix 8.
* Hexadecimal numbers start with `0x` and allow digits between `0` and `9`, as well as letters `A` through `F`, in either lower or upper case, which may be separated by `_`. They represent numbers in radix 16.
@ -380,29 +615,50 @@ There are four syntaxes for Numbers: Decimal, Hexadecimal, Octal, and Binary.
* They may optionally include a decimal separator `.`, followed by more digits, which may again be separated by `_`.
* They may optionally be followed by `E` or `e`, an optional `-` or `+`, and more digits, to represent an exponent value.
Note that, similar to JSON and some other languages,
numbers without an integer digit (such as `.1`) are illegal.
They must be written with at least one integer digit, like `0.1`.
(These patterns are also disallowed from [Identifier Strings](#identifier-string), to avoid confusion.)
#### Keyword Numbers
There are three special "keyword" numbers included in KDL to accomodate the
widespread use of [IEEE 754](https://en.wikipedia.org/wiki/IEEE_754) floats:
* `#inf` - floating point positive infinity.
* `#-inf` - floating point negative infinity.
* `#nan` - floating point NaN/Not a Number.
To go along with this and prevent foot guns, the bare [Identifier
Strings](#identifier-string) `inf`, `-inf`, and `nan` are considered illegal
identifiers and should yield a syntax error.
The existence of these keywords does not imply that any numbers be represented
as IEEE 754 floats. These are simply for clarity and convenience for any
implementation that chooses to represent their numbers in this way.
### Boolean
A boolean [Value](#value) is either the symbol `true` or `false`. These
A boolean [Value](#value) is either the symbol `#true` or `#false`. These
_SHOULD_ be represented by implementation as boolean logical values, or some
approximation thereof.
#### Example
```kdl
my-node true value=false
my-node #true value=#false
```
### Null
The symbol `null` represents a null [Value](#value). It's up to the
The symbol `#null` represents a null [Value](#value). It's up to the
implementation to decide how to represent this, but it generally signals the
"absence" of a value. It is reasonable for an implementation to ignore null
values altogether when deserializing.
"absence" of a value.
#### Example
```kdl
my-node null key=null
my-node #null key=#null
```
### Whitespace
@ -413,6 +669,7 @@ space](https://www.unicode.org/Public/UCD/latest/ucd/PropList.txt):
| Name | Code Pt |
|----------------------|---------|
| Character Tabulation | `U+0009` |
| Line Tabulation | `U+000B` |
| Space | `U+0020` |
| No-Break Space | `U+00A0` |
| Ogham Space Mark | `U+1680` |
@ -431,6 +688,11 @@ space](https://www.unicode.org/Public/UCD/latest/ucd/PropList.txt):
| Medium Mathematical Space | `U+205F` |
| Ideographic Space | `U+3000` |
#### Single-line comments
Any text after `//`, until the next literal [Newline](#newline) is "commented
out", and is considered to be [Whitespace](#whitespace).
#### Multi-line comments
In addition to single-line comments using `//`, comments can also be started
@ -438,9 +700,30 @@ with `/*` and ended with `*/`. These comments can span multiple lines. They
are allowed in all positions where [Whitespace](#whitespace) is allowed and
can be nested.
#### Slashdash comments
Finally, a special kind of comment called a "slashdash", denoted by `/-`, can
be used to comment out entire _components_ of a KDL document logically, and
have those elements not be included as part of the parsed document data.
Slashdash comments can be used before the following, including before their type
annotations, if present:
* A [Node](#node): the entire Node is treated as Whitespace, including all
props, args, and children.
* An [Argument](#argument): the Argument value is treated as Whitespace.
* A [Property](#property) key: the entire property, including both key and value,
is treated as Whitespace. A slashdash of just the property value is not allowed.
* A [Children Block](#children-block): the entire block, including all
children within, is treated as Whitespace. Only other children blocks, whether
slashdashed or not, may follow a slashdashed children block.
A slashdash may be be followed by any amount of whitespace, including newlines and
comments, before the element that it comments out.
### Newline
The following characters [should be treated as new
The following character sequences [should be treated as new
lines](https://www.unicode.org/versions/Unicode13.0.0/ch05.pdf):
| Acronym | Name | Code Pt |
@ -455,36 +738,76 @@ lines](https://www.unicode.org/versions/Unicode13.0.0/ch05.pdf):
Note that for the purpose of new lines, CRLF is considered _a single newline_.
### Disallowed Literal Code Points
The following code points may not appear literally anywhere in the document.
They may be represented in Strings (but not Raw Strings) using `\u{}`.
* The codepoints `U+0000-0008` or the codepoints `U+000E-001F` (various
control characters).
* `U+007F` (the Delete control character).
* Any codepoint that is not a [Unicode Scalar
Value](https://unicode.org/glossary/#unicode_scalar_value) (`U+D800-DFFF`).
* `U+200E-200F`, `U+202A-202E`, and `U+2066-2069`, the [unicode
"direction control"
characters](https://www.w3.org/International/questions/qa-bidi-unicode-controls)
* `U+FEFF`, aka Zero-width Non-breaking Space (ZWNBSP)/Byte Order Mark (BOM),
except as the first code point in a document.
## Full Grammar
```
nodes := linespace* (node nodes?)? linespace*
This is the full official grammar for KDL and should be considered
authoritative if something seems to disagree with the text above. The [grammar
language syntax](#grammar-language) is defined below.
node := ('/-' node-space*)? type? identifier (node-space+ node-prop-or-arg)* (node-space* node-children ws*)? node-space* node-terminator
node-prop-or-arg := ('/-' node-space*)? (prop | value)
node-children := ('/-' node-space*)? '{' nodes '}'
node-space := ws* escline ws* | ws+
```
document := bom? nodes
// Nodes
nodes := (line-space* node)* line-space*
base-node := slashdash? type? node-space* string
(node-space+ slashdash? node-prop-or-arg)*
// slashdashed node-children must always be after props and args.
(node-space+ slashdash node-children)*
(node-space+ node-children)?
(node-space+ slashdash node-children)*
node := base-node node-space* node-terminator
final-node := base-node node-space* node-terminator?
// Entries
node-prop-or-arg := prop | value
node-children := '{' nodes final-node? '}'
node-terminator := single-line-comment | newline | ';' | eof
identifier := string | bare-identifier
bare-identifier := ((identifier-char - digit - sign) identifier-char* | sign ((identifier-char - digit) identifier-char*)?) - keyword
identifier-char := unicode - linespace - [\/(){}<>;[]=,"]
keyword := boolean | 'null'
prop := identifier '=' value
value := type? (string | number | keyword)
type := '(' identifier ')'
prop := string node-space* '=' node-space* value
value := type? node-space* (string | number | keyword)
type := '(' node-space* string node-space* ')'
string := raw-string | escaped-string
escaped-string := '"' character* '"'
character := '\' escape | [^\"]
escape := ["\\/bfnrt] | 'u{' hex-digit{1, 6} '}'
// Strings
string := identifier-string | quoted-string | raw-string
identifier-string := unambiguous-ident | signed-ident | dotted-ident
unambiguous-ident := ((identifier-char - digit - sign - '.') identifier-char*) - disallowed-keyword-strings
signed-ident := sign ((identifier-char - digit - '.') identifier-char*)?
dotted-ident := sign? '.' ((identifier-char - digit) identifier-char*)?
identifier-char := unicode - unicode-space - newline - [\\/(){};\[\]"#=] - disallowed-literal-code-points - equals-sign
disallowed-keyword-identifiers := 'true' - 'false' - 'null' - 'inf' - '-inf' - 'nan'
quoted-string := '"' (single-line-string-body | newline multi-line-string-body newline unicode-space*) '"'
single-line-string-body := (string-character - newline)*
multi-line-string-body := string-character*
string-character := '\' escape | [^\\"] - disallowed-literal-code-points
escape := ["\\bfnrts] | 'u{' hex-digit{1, 6} '}' | (unicode-space | newline)+
hex-digit := [0-9a-fA-F]
raw-string := 'r' raw-string-hash
raw-string-hash := '#' raw-string-hash '#' | raw-string-quotes
raw-string-quotes := '"' .* '"'
raw-string := '#' raw-string-quotes '#' | '#' raw-string '#'
raw-string-quotes := '"' (single-line-raw-string-body | newline multi-line-raw-string-body newline unicode-space*) '"'
single-line-raw-string-body := (unicode - newline - disallowed-literal-code-points)*
multi-line-raw-string-body := (unicode - disallowed-literal-code-points)*
number := hex | octal | binary | decimal
// Numbers
number := keyword-number | hex | octal | binary | decimal
decimal := sign? integer ('.' integer)? exponent?
exponent := ('e' | 'E') sign? integer
@ -496,21 +819,54 @@ hex := sign? '0x' hex-digit (hex-digit | '_')*
octal := sign? '0o' [0-7] [0-7_]*
binary := sign? '0b' ('0' | '1') ('0' | '1' | '_')*
boolean := 'true' | 'false'
escline := '\\' ws* (single-line-comment | newline)
linespace := newline | ws | single-line-comment
newline := See Table (All line-break white_space)
ws := bom | unicode-space | multi-line-comment
// Keywords and booleans.
keyword := boolean | '#null'
keyword-number := '#inf' | '#-inf' | '#nan'
boolean := '#true' | '#false'
// Specific code points
bom := '\u{FEFF}'
disallowed-literal-code-points := See Table (Disallowed Literal Code Points)
unicode := Any Unicode Scalar Value
unicode-space := See Table (All White_Space unicode characters which are not `newline`)
single-line-comment := '//' ^newline+ (newline | eof)
// Comments
single-line-comment := '//' ^newline* (newline | eof)
multi-line-comment := '/*' commented-block
commented-block := '*/' | (multi-line-comment | '*' | '/' | [^*/]+) commented-block
slashdash := '/-' line-space*
// Whitespace
ws := unicode-space | multi-line-comment
escline := '\\' ws* (single-line-comment | newline | eof)
newline := See Table (All Newline White_Space)
// Whitespace where newlines are allowed.
line-space := newline | ws | single-line-comment
// Whitespace within nodes, where newline-ish things must be esclined.
node-space := ws* escline ws* | ws+
```
### Grammar language
The grammar language syntax is a combination of ABNF with some regex spice thrown in.
Specifically:
* Single quotes (`'`) are used to denote literal text. `\` within a literal
string is used for escaping other single-quotes, for initiating unicode
characters using hex values (`\u{FEFF}`), and for escaping `\` itself
(`\\`).
* `*` is used for "zero or more", `+` is used for "one or more", and `?` is
used for "zero or one".
* `()` can be used to group matches that must be matched together.
* `a | b` means `a or b`, whichever matches first. If multipe items are before
a `|`, they are a single group. `a b c | d` is equivalent to `(a b c) | d`.
* `[]` are used for regex-style character matches, where any character between
the brackets will be a single match. `\` is used to escape `\`, `[`, and
`]`. They also support character ranges (`0-9`), and negation (`^`)
* `-` is used for "except for" or "minus" whatever follows it. For example,
`a - 'x'` means "any `a`, except something that matches the literal `'x'`".
* The prefix `^` means "something that does not match" whatever follows it.
For example, `^foo` means "must not match `foo`".
* A single definition may be split over multiple lines. Newlines are treated as
spaces.
* `//` at the beginning of a line is used for comments.

View File

@ -25,7 +25,7 @@ XML elements and KDL nodes have a direct correspondence. In XiK, an XML element
* making the attributes into KDL properties
* making the child nodes as KDL child nodes
For example, the XML `<element foo="bar"><child baz="qux" /></element>` is encoded into XiK as `element foo="bar" { child baz="qux" }`.
For example, the XML `<element foo="bar"><child baz="qux" /></element>` is encoded into XiK as `element foo=bar { child baz=quux }`.
XML namespaces are encoded the same as XML: the node name simply contains a `:` character. Note that KDL identifier syntax allows `:` directly in an ident, so a name like `xml:space` or `xlink:href` is a valid node or property name.
@ -35,9 +35,9 @@ Raw text contents of an element can be encoded in two possible ways.
If the element contains *only* text, it should be encoded as a final string unnamed argument. For example, the XML `<a href="http://example.com">here's a link</a>` can be encoded as `a href="http://example.com" "here's a link"`.
If the element contains mixed text and element children, the text can be encoded as a KDL node with the name `-` with a single string unnamed argument. For example, the XML `<span>some <b>bold</b> text</span>` can be encoded as `span { - "some "; b "bold"; - " text" }`.
If the element contains mixed text and element children, the text can be encoded as a KDL node with the name `-` with a single string unnamed argument. For example, the XML `<span>some <b>bold</b> text</span>` can be encoded as `span { - "some "; b bold; - " text" }`.
An element that contains only text *is allowed to* encode it as `-` children. For example, `<span>foo</span>` *may* be encoded as `span { - "foo" }` instead of `span "foo"`. However, an element cannot mix the "final string attribute" with child nodes; `span "foo" { b "bar" }` is an **invalid** encoding of `<span>foo<b>bar</b></span>`. (It must be encoded as `span { - "foo"; b "bar" }`.)
An element that contains only text *is allowed to* encode it as `-` children. For example, `<span>foo</span>` *may* be encoded as `span { - foo }` instead of `span foo`. However, an element cannot mix the "final string attribute" with child nodes; `span foo { b bar }` is an **invalid** encoding of `<span>foo<b>bar</b></span>`. (It must be encoded as `span { - foo; b bar }`.)
CDATA sections are not preserved in this encoding, as they are merely a source convenience so you don't have to escape a bunch of characters. They are encoded as normal textual contents would be.
@ -53,13 +53,13 @@ Processing instructions and XML declarations (nodes that look like `<?foo ... ?>
The contents of a PI are technically completely unstructured. However, in practice most PIs' contents look like start-tag attributes. If this is the case, they should be encoded as properties on the node, with string values. For example, `<?xml version="1.0"?>` is encoded as `?xml version="1.0"`.
If the contents of a PI do *not* look like attributes, then instead the entire contents (from the end of the whitespace following the PI name, to the closing `?>` characters) are encoded as a single unnamed string value. For example, the preceding XML declaration *could* be alternately encoded as `?xml r#"version="1.0""#` (but shouldn't be).
If the contents of a PI do *not* look like attributes, then instead the entire contents (from the end of the whitespace following the PI name, to the closing `?>` characters) are encoded as a single unnamed string value. For example, the preceding XML declaration *could* be alternately encoded as `?xml #"version="1.0""#` (but shouldn't be).
(Note that XML declarations are not needed when writing XiK directly; the version is always 1.0, and the encoding is always UTF-8 since it's KDL.)
----
Doctypes (nodes that look like `<!DOCTYPE ...>`) are encoded similarly to unstructured Processing Instructions. They have a node name of `!doctype`, and the entire contents of the node, from the end of the whitespace following the "DOCTYPE" to the closing `>`, are encoded as a single unnamed string value. For example, the HTML doctype `<!DOCTYPE html>` is encoded as `!doctype "html"`, while the XHTML 1 Strict doctype would be encoded as `!doctype r#"html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd""#`
Doctypes (nodes that look like `<!DOCTYPE ...>`) are encoded similarly to unstructured Processing Instructions. They have a node name of `!doctype`, and the entire contents of the node, from the end of the whitespace following the "DOCTYPE" to the closing `>`, are encoded as a single unnamed string value. For example, the HTML doctype `<!DOCTYPE html>` is encoded as `!doctype html`, while the XHTML 1 Strict doctype would be encoded as `!doctype #"html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd""#`
----

View File

@ -1,9 +1,9 @@
package {
name "kdl"
name kdl
version "0.0.0"
description "kat's document language"
description "The kdl document language"
authors "Kat Marchán <kzm@zkat.tech>"
license-file "LICENSE.md"
license-file LICENSE.md
edition "2018"
}

View File

@ -1,47 +1,52 @@
// This example is a GitHub Action if it used KDL syntax.
// See .github/workflows/ci.yml for the file this was based on.
name "CI"
name CI
on "push" "pull_request"
on push pull_request
env {
RUSTFLAGS "-Dwarnings"
RUSTFLAGS -Dwarnings
}
jobs {
fmt_and_docs "Check fmt & build docs" {
runs-on "ubuntu-latest"
runs-on ubuntu-latest
steps {
step uses="actions/checkout@v1"
step "Install Rust" uses="actions-rs/toolchain@v1" {
profile "minimal"
toolchain "stable"
components "rustfmt"
override true
profile minimal
toolchain stable
components rustfmt
override #true
}
step "rustfmt" run="cargo fmt --all -- --check"
step "docs" run="cargo doc --no-deps"
step rustfmt { run cargo fmt --all -- --check }
step docs { run cargo doc --no-deps }
}
}
build_and_test "Build & Test" {
runs-on "${{ matrix.os }}"
strategy {
matrix {
rust "1.46.0" "stable"
os "ubuntu-latest" "macOS-latest" "windows-latest"
rust "1.46.0" stable
os ubuntu-latest macOS-latest windows-latest
}
}
steps {
step uses="actions/checkout@v1"
step "Install Rust" uses="actions-rs/toolchain@v1" {
profile "minimal"
profile minimal
toolchain "${{ matrix.rust }}"
components "clippy"
override true
components clippy
override #true
}
step "Clippy" run="cargo clippy --all -- -D warnings"
step "Run tests" run="cargo test --all --verbose"
step Clippy { run cargo clippy --all -- -D warnings }
step "Run tests" { run cargo test --all --verbose }
step "Other Stuff" run="
echo foo
echo bar
echo baz
"
}
}
}

View File

@ -1,293 +1,293 @@
document {
info {
title "KDL Schema" lang="en"
description "KDL Schema KDL schema in KDL" lang="en"
title "KDL Schema" lang=en
description "KDL Schema KDL schema in KDL" lang=en
author "Kat Marchán" {
link "https://github.com/zkat" rel="self"
link "https://github.com/zkat" rel=self
}
contributor "Lars Willighagen" {
link "https://github.com/larsgw" rel="self"
link "https://github.com/larsgw" rel=self
}
link "https://github.com/zkat/kdl" rel="documentation"
license "Creative Commons Attribution-ShareAlike 4.0 International License" spdx="CC-BY-SA-4.0" {
link "https://creativecommons.org/licenses/by-sa/4.0/" lang="en"
link "https://github.com/zkat/kdl" rel=documentation
license "Creative Commons Attribution-ShareAlike 4.0 International License" spdx=CC-BY-SA-4.0 {
link "https://creativecommons.org/licenses/by-sa/4.0/" lang=en
}
published "2021-08-31"
modified "2021-09-01"
}
node "document" {
node document {
min 1
max 1
children id="node-children" {
node "node-names" id="node-names-node" description="Validations to apply specifically to arbitrary node names" {
children ref=r#"[id="validations"]"#
children id=node-children {
node node-names id=node-names-node description="Validations to apply specifically to arbitrary node names" {
children ref=#"[id="validations"]"#
}
node "other-nodes-allowed" id="other-nodes-allowed-node" description="Whether to allow child nodes other than the ones explicitly listed. Defaults to 'false'." {
node other-nodes-allowed id=other-nodes-allowed-node description="Whether to allow child nodes other than the ones explicitly listed. Defaults to '#false'." {
max 1
value {
min 1
max 1
type "boolean"
type boolean
}
}
node "tag-names" description="Validations to apply specifically to arbitrary type tag names" {
children ref=r#"[id="validations"]"#
node tag-names description="Validations to apply specifically to arbitrary type tag names" {
children ref=#"[id="validations"]"#
}
node "other-tags-allowed" description="Whether to allow child node tags other than the ones explicitly listed. Defaults to 'false'." {
node other-tags-allowed description="Whether to allow child node tags other than the ones explicitly listed. Defaults to '#false'." {
max 1
value {
min 1
max 1
type "boolean"
type boolean
}
}
node "info" description="A child node that describes the schema itself." {
node info description="A child node that describes the schema itself." {
children {
node "title" description="The title of the schema or the format it describes" {
node title description="The title of the schema or the format it describes" {
value description="The title text" {
type "string"
type string
min 1
max 1
}
prop "lang" id="info-lang" description="The language of the text" {
type "string"
prop lang id=info-lang description="The language of the text" {
type string
}
}
node "description" description="A description of the schema or the format it describes" {
node description description="A description of the schema or the format it describes" {
value description="The description text" {
type "string"
type string
min 1
max 1
}
prop ref=r#"[id="info-lang"]"#
prop ref=#"[id="info-lang"]"#
}
node "author" description="Author of the schema" {
value id="info-person-name" description="Person name" {
type "string"
node author description="Author of the schema" {
value id=info-person-name description="Person name" {
type string
min 1
max 1
}
prop "orcid" id="info-orcid" description="The ORCID of the person" {
type "string"
pattern r"\d{4}-\d{4}-\d{4}-\d{4}"
prop orcid id=info-orcid description="The ORCID of the person" {
type string
pattern #"\d{4}-\d{4}-\d{4}-\d{4}"#
}
children {
node ref=r#"[id="info-link"]"#
node ref=#"[id="info-link"]"#
}
}
node "contributor" description="Contributor to the schema" {
value ref=r#"[id="info-person-name"]"#
prop ref=r#"[id="info-orcid"]"#
node contributor description="Contributor to the schema" {
value ref=#"[id="info-person-name"]"#
prop ref=#"[id="info-orcid"]"#
children {
node ref=r#"[id="info-link"]"#
node ref=#"[id="info-link"]"#
}
}
node "link" id="info-link" description="Links to itself, and to sources describing it" {
node link id=info-link description="Links to itself, and to sources describing it" {
value description="A URL that the link points to" {
type "string"
format "url" "irl"
type string
format url irl
min 1
max 1
}
prop "rel" description="The relation between the current entity and the URL" {
type "string"
enum "self" "documentation"
prop rel description="The relation between the current entity and the URL" {
type string
enum self documentation
}
prop ref=r#"[id="info-lang"]"#
prop ref=#"[id="info-lang"]"#
}
node "license" description="The license(s) that the schema is licensed under" {
node license description="The license(s) that the schema is licensed under" {
value description="Name of the used license" {
type "string"
type string
min 1
max 1
}
prop "spdx" description="An SPDX license identifier" {
type "string"
prop spdx description="An SPDX license identifier" {
type string
}
children {
node ref=r#"[id="info-link"]"#
node ref=#"[id="info-link"]"#
}
}
node "published" description="When the schema was published" {
node published description="When the schema was published" {
value description="Publication date" {
type "string"
format "date"
type string
format date
min 1
max 1
}
prop "time" id="info-time" description="A time to accompany the date" {
type "string"
format "time"
prop time id=info-time description="A time to accompany the date" {
type string
format time
}
}
node "modified" description="When the schema was last modified" {
node modified description="When the schema was last modified" {
value description="Modification date" {
type "string"
format "date"
type string
format date
min 1
max 1
}
prop ref=r#"[id="info-time"]"#
prop ref=#"[id="info-time"]"#
}
node "version" description="The version number of this version of the schema" {
node version description="The version number of this version of the schema" {
value description="Semver version number" {
type "string"
pattern r"^(0|[1-9]\d*)\.(0|[1-9]\d*)\.(0|[1-9]\d*)(?:-((?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\.(?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\+([0-9a-zA-Z-]+(?:\.[0-9a-zA-Z-]+)*))?$"
type string
pattern #"^(0|[1-9]\d*)\.(0|[1-9]\d*)\.(0|[1-9]\d*)(?:-((?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\.(?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\+([0-9a-zA-Z-]+(?:\.[0-9a-zA-Z-]+)*))?$"#
min 1
max 1
}
}
}
}
node "tag" id="tag-node" description="A tag belonging to a child node of `document` or another node." {
node tag id=tag-node description="A tag belonging to a child node of `document` or another node." {
value description="The name of the tag. If a tag name is not supplied, the node rules apply to _all_ nodes belonging to the parent." {
type "string"
type string
max 1
}
prop "description" description="A description of this node's purpose." {
type "string"
prop description description="A description of this node's purpose." {
type string
}
prop "id" description="A globally-unique ID for this node." {
type "string"
prop id description="A globally-unique ID for this node." {
type string
}
prop "ref" description="A globally unique reference to another node." {
type "string"
format "kdl-query"
prop ref description="A globally unique reference to another node." {
type string
format kdl-query
}
children {
node ref=r#"[id="node-names-node"]"#
node ref=r#"[id="other-nodes-allowed-node"]"#
node ref=r#"[id="node-node"]"#
node ref=#"[id="node-names-node"]"#
node ref=#"[id="other-nodes-allowed-node"]"#
node ref=#"[id="node-node"]"#
}
}
node "node" id="node-node" description="A child node belonging either to `document` or to another `node`. Nodes may be anonymous." {
node node id=node-node description="A child node belonging either to `document` or to another `node`. Nodes may be anonymous." {
value description="The name of the node. If a node name is not supplied, the node rules apply to _all_ nodes belonging to the parent." {
type "string"
type string
max 1
}
prop "description" description="A description of this node's purpose." {
type "string"
prop description description="A description of this node's purpose." {
type string
}
prop "id" description="A globally-unique ID for this node." {
type "string"
prop id description="A globally-unique ID for this node." {
type string
}
prop "ref" description="A globally unique reference to another node." {
type "string"
format "kdl-query"
prop ref description="A globally unique reference to another node." {
type string
format kdl-query
}
children {
node "prop-names" description="Validations to apply specifically to arbitrary property names" {
children ref=r#"[id="validations"]"#
node prop-names description="Validations to apply specifically to arbitrary property names" {
children ref=#"[id="validations"]"#
}
node "other-props-allowed" description="Whether to allow properties other than the ones explicitly listed. Defaults to 'false'." {
node other-props-allowed description="Whether to allow properties other than the ones explicitly listed. Defaults to '#false'." {
max 1
value {
min 1
max 1
type "boolean"
type boolean
}
}
node "min" description="minimum number of instances of this node in its parent's children." {
node min description="minimum number of instances of this node in its parent's children." {
max 1
value {
min 1
max 1
type "number"
type number
}
}
node "max" description="maximum number of instances of this node in its parent's children." {
node max description="maximum number of instances of this node in its parent's children." {
max 1
value {
min 1
max 1
type "number"
type number
}
}
node ref=r#"[id="value-tag-node"]"#
node "prop" id="prop-node" description="A node property key/value pair." {
node ref=#"[id="value-tag-node"]"#
node prop id="prop-node" description="A node property key/value pair." {
value description="The property key." {
type "string"
type string
}
prop "id" description="A globally-unique ID of this property." {
type "string"
prop id description="A globally-unique ID of this property." {
type string
}
prop "ref" description="A globally unique reference to another property node." {
type "string"
format "kdl-query"
prop ref description="A globally unique reference to another property node." {
type string
format kdl-query
}
prop "description" description="A description of this property's purpose." {
type "string"
prop description description="A description of this property's purpose." {
type string
}
children description="Property-specific validations." {
node "required" description="Whether this property is required if its parent is present." {
node required description="Whether this property is required if its parent is present." {
max 1
value {
min 1
max 1
type "boolean"
type boolean
}
}
}
children id="validations" description="General value validations." {
node "tag" id="value-tag-node" description="The tags associated with this value" {
children id=validations description="General value validations." {
node tag id=value-tag-node description="The tags associated with this value" {
max 1
children ref=r#"[id="validations"]"#
children ref=#"[id="validations"]"#
}
node "type" description="The type for this prop's value." {
node type description="The type for this prop's value." {
max 1
value {
min 1
type "string"
type string
}
}
node "enum" description="An enumeration of possible values" {
node enum description="An enumeration of possible values" {
max 1
value description="Enumeration choices" {
min 1
}
}
node "pattern" description="PCRE (Regex) pattern or patterns to test prop values against." {
node pattern description="PCRE (Regex) pattern or patterns to test prop values against." {
value {
min 1
type "string"
type string
}
}
node "min-length" description="Minimum length of prop value, if it's a string." {
node min-length description="Minimum length of prop value, if it's a string." {
max 1
value {
min 1
type "number"
type number
}
}
node "max-length" description="Maximum length of prop value, if it's a string." {
node max-length description="Maximum length of prop value, if it's a string." {
max 1
value {
min 1
type "number"
type number
}
}
node "format" description="Intended data format." {
node format description="Intended data format." {
max 1
value {
min 1
type "string"
type string
// https://json-schema.org/understanding-json-schema/reference/string.html#format
enum "date-time" "date" "time" "duration" "decimal" "currency" "country-2" "country-3" "country-subdivision" "email" "idn-email" "hostname" "idn-hostname" "ipv4" "ipv6" "url" "url-reference" "irl" "irl-reference" "url-template" "regex" "uuid" "kdl-query" "i8" "i16" "i32" "i64" "u8" "u16" "u32" "u64" "isize" "usize" "f32" "f64" "decimal64" "decimal128"
enum date-time date time duration decimal currency country-2 country-3 country-subdivision email idn-email hostname idn-hostname ipv4 ipv6 url url-reference irl irl-reference url-template regex uuid kdl-query i8 i16 i32 i64 u8 u16 u32 u64 isize usize f32 f64 decimal64 decimal128
}
}
node "%" description="Only used for numeric values. Constrains them to be multiples of the given number(s)" {
node % description="Only used for numeric values. Constrains them to be multiples of the given number(s)" {
max 1
value {
min 1
type "number"
type number
}
}
node ">" description="Only used for numeric values. Constrains them to be greater than the given number(s)" {
node > description="Only used for numeric values. Constrains them to be greater than the given number(s)" {
max 1
value {
min 1
max 1
type "number"
type number
}
}
node ">=" description="Only used for numeric values. Constrains them to be greater than or equal to the given number(s)" {
@ -295,15 +295,15 @@ document {
value {
min 1
max 1
type "number"
type number
}
}
node "<" description="Only used for numeric values. Constrains them to be less than the given number(s)" {
node < description="Only used for numeric values. Constrains them to be less than the given number(s)" {
max 1
value {
min 1
max 1
type "number"
type number
}
}
node "<=" description="Only used for numeric values. Constrains them to be less than or equal to the given number(s)" {
@ -311,64 +311,64 @@ document {
value {
min 1
max 1
type "number"
type number
}
}
}
}
node "value" id="value-node" description="one or more direct node values" {
prop "id" description="A globally-unique ID of this value." {
type "string"
node value id=value-node description="one or more direct node values" {
prop id description="A globally-unique ID of this value." {
type string
}
prop "ref" description="A globally unique reference to another value node." {
type "string"
format "kdl-query"
prop ref description="A globally unique reference to another value node." {
type string
format kdl-query
}
prop "description" description="A description of this property's purpose." {
type "string"
prop description description="A description of this property's purpose." {
type string
}
children ref=r#"[id="validations"]"#
children ref=#"[id="validations"]"#
children description="Node value-specific validations" {
node "min" description="minimum number of values for this node." {
node min description="minimum number of values for this node." {
max 1
value {
min 1
max 1
type "number"
type number
}
}
node "max" description="maximum number of values for this node." {
node max description="maximum number of values for this node." {
max 1
value {
min 1
max 1
type "number"
type number
}
}
}
}
node "children" id="children-node" {
prop "id" description="A globally-unique ID of this children node." {
type "string"
node children id=children-node {
prop id description="A globally-unique ID of this children node." {
type string
}
prop "ref" description="A globally unique reference to another children node." {
type "string"
format "kdl-query"
prop ref description="A globally unique reference to another children node." {
type string
format kdl-query
}
prop "description" description="A description of this these children's purpose." {
type "string"
prop description description="A description of this these children's purpose." {
type string
}
children ref=r#"[id="node-children"]"#
children ref=#"[id="node-children"]"#
}
}
}
node "definitions" description="Definitions to reference in parts of the top-level nodes" {
node definitions description="Definitions to reference in parts of the top-level nodes" {
children {
node ref=r#"[id="node-node"]"#
node ref=r#"[id="value-node"]"#
node ref=r#"[id="prop-node"]"#
node ref=r#"[id="children-node"]"#
node ref=r#"[id="tag-node"]"#
node ref=#"[id="node-node"]"#
node ref=#"[id="value-node"]"#
node ref=#"[id="prop-node"]"#
node ref=#"[id="children-node"]"#
node ref=#"[id="tag-node"]"#
}
}
}

View File

@ -1,48 +1,48 @@
// Based on https://github.com/NuGet/NuGet.Client/blob/dev/src/NuGet.Clients/NuGet.CommandLine/NuGet.CommandLine.csproj
Project {
PropertyGroup {
IsCommandLinePackage true
IsCommandLinePackage #true
}
Import Project=r"$([MSBuild]::GetDirectoryNameOfFileAbove($(MSBuildThisFileDirectory), 'README.md'))\build\common.props"
Import Project="Sdk.props" Sdk="Microsoft.NET.Sdk"
Import Project="ilmerge.props"
Import Project=#"$([MSBuild]::GetDirectoryNameOfFileAbove($(MSBuildThisFileDirectory), 'README.md'))\build\common.props"#
Import Project=Sdk.props Sdk=Microsoft.NET.Sdk
Import Project=ilmerge.props
PropertyGroup {
RootNamespace "NuGet.CommandLine"
AssemblyName "NuGet"
RootNamespace NuGet.CommandLine
AssemblyName NuGet
AssemblyTitle "NuGet Command Line"
PackageId "NuGet.CommandLine"
PackageId NuGet.CommandLine
TargetFramework "$(NETFXTargetFramework)"
GenerateDocumentationFile false
GenerateDocumentationFile #false
Description "NuGet Command Line Interface."
ApplicationManifest "app.manifest"
Shipping true
OutputType "Exe"
ComVisible false
ApplicationManifest app.manifest
Shipping #true
OutputType Exe
ComVisible #false
// Pack properties
PackProject true
IncludeBuildOutput false
PackProject #true
IncludeBuildOutput #false
TargetsForTfmSpecificContentInPackage "$(TargetsForTfmSpecificContentInPackage)" "CreateCommandlineNupkg"
SuppressDependenciesWhenPacking true
DevelopmentDependency true
PackageRequireLicenseAcceptance false
UsePublicApiAnalyzer false
SuppressDependenciesWhenPacking #true
DevelopmentDependency #true
PackageRequireLicenseAcceptance #false
UsePublicApiAnalyzer #false
}
Target Name="CreateCommandlineNupkg" {
Target Name=CreateCommandlineNupkg {
ItemGroup {
TfmSpecificPackageFile Include=r"$(ArtifactsDirectory)$(VsixOutputDirName)\NuGet.exe" {
TfmSpecificPackageFile Include=#"$(ArtifactsDirectory)$(VsixOutputDirName)\NuGet.exe"# {
PackagePath "tools/"
}
TfmSpecificPackageFile Include=r"$(ArtifactsDirectory)$(VsixOutputDirName)\NuGet.pdb" {
TfmSpecificPackageFile Include=#"$(ArtifactsDirectory)$(VsixOutputDirName)\NuGet.pdb"# {
PackagePath "tools/"
}
}
}
ItemGroup Condition="$(DefineConstants.Contains(SIGNED_BUILD))" {
AssemblyAttribute Include="System.Runtime.CompilerServices.InternalsVisibleTo" {
AssemblyAttribute Include=System.Runtime.CompilerServices.InternalsVisibleTo {
_Parameter1 "NuGet.CommandLine.FuncTest, PublicKey=002400000480000094000000060200000024000052534131000400000100010007d1fa57c4aed9f0a32e84aa0faefd0de9e8fd6aec8f87fb03766c834c99921eb23be79ad9d5dcc1dd9ad236132102900b723cf980957fc4e177108fc607774f29e8320e92ea05ece4e821c0a5efe8f1645c4c0c93c1ab99285d622caa652c1dfad63d745d6f2de5f17e5eaf0fc4963d261c8a12436518206dc093344d5ad293"
}
AssemblyAttribute Include="System.Runtime.CompilerServices.InternalsVisibleTo" {
@ -51,81 +51,81 @@ Project {
}
ItemGroup Condition="!$(DefineConstants.Contains(SIGNED_BUILD))" {
AssemblyAttribute Include="System.Runtime.CompilerServices.InternalsVisibleTo" {
_Parameter1 "NuGet.CommandLine.FuncTest"
AssemblyAttribute Include=System.Runtime.CompilerServices.InternalsVisibleTo {
_Parameter1 NuGet.CommandLine.FuncTest
}
AssemblyAttribute Include="System.Runtime.CompilerServices.InternalsVisibleTo" {
_Parameter1 "NuGet.CommandLine.Test"
AssemblyAttribute Include=System.Runtime.CompilerServices.InternalsVisibleTo {
_Parameter1 NuGet.CommandLine.Test
}
}
ItemGroup Condition="$(DefineConstants.Contains(SIGNED_BUILD))" {
AssemblyAttribute Include="System.Runtime.CompilerServices.InternalsVisibleTo" {
AssemblyAttribute Include=System.Runtime.CompilerServices.InternalsVisibleTo {
_Parameter1 "NuGet.CommandLine.Test, PublicKey=002400000480000094000000060200000024000052534131000400000100010007d1fa57c4aed9f0a32e84aa0faefd0de9e8fd6aec8f87fb03766c834c99921eb23be79ad9d5dcc1dd9ad236132102900b723cf980957fc4e177108fc607774f29e8320e92ea05ece4e821c0a5efe8f1645c4c0c93c1ab99285d622caa652c1dfad63d745d6f2de5f17e5eaf0fc4963d261c8a12436518206dc093344d5ad293"
}
}
ItemGroup Condition="!$(DefineConstants.Contains(SIGNED_BUILD))" {
AssemblyAttribute Include="System.Runtime.CompilerServices.InternalsVisibleTo" {
_Parameter1 "NuGet.CommandLine.Test"
AssemblyAttribute Include=System.Runtime.CompilerServices.InternalsVisibleTo {
_Parameter1 NuGet.CommandLine.Test
}
}
ItemGroup {
Reference Include="Microsoft.Build.Utilities.v4.0"
Reference Include="Microsoft.CSharp"
Reference Include="System"
Reference Include="System.ComponentModel.Composition"
Reference Include="System.ComponentModel.Composition.Registration"
Reference Include="System.ComponentModel.DataAnnotations"
Reference Include="System.IO.Compression"
Reference Include="System.Net.Http"
Reference Include="System.Xml"
Reference Include="System.Xml.Linq"
Reference Include="NuGet.Core" {
HintPath r"$(SolutionPackagesFolder)nuget.core\2.14.0-rtm-832\lib\net40-Client\NuGet.Core.dll"
Aliases "CoreV2"
Reference Include=Microsoft.Build.Utilities.v4.0
Reference Include=Microsoft.CSharp
Reference Include=System
Reference Include=System.ComponentModel.Composition
Reference Include=System.ComponentModel.Composition.Registration
Reference Include=System.ComponentModel.DataAnnotations
Reference Include=System.IO.Compression
Reference Include=System.Net.Http
Reference Include=System.Xml
Reference Include=System.Xml.Linq
Reference Include=NuGet.Core {
HintPath #"$(SolutionPackagesFolder)nuget.core\2.14.0-rtm-832\lib\net40-Client\NuGet.Core.dll"#
Aliases CoreV2
}
}
ItemGroup {
PackageReference Include="Microsoft.VisualStudio.Setup.Configuration.Interop"
ProjectReference Include=r"$(NuGetCoreSrcDirectory)NuGet.PackageManagement\NuGet.PackageManagement.csproj"
ProjectReference Include=r"$(NuGetCoreSrcDirectory)NuGet.Build.Tasks\NuGet.Build.Tasks.csproj"
PackageReference Include=Microsoft.VisualStudio.Setup.Configuration.Interop
ProjectReference Include=#"$(NuGetCoreSrcDirectory)NuGet.PackageManagement\NuGet.PackageManagement.csproj"#
ProjectReference Include=#"$(NuGetCoreSrcDirectory)NuGet.Build.Tasks\NuGet.Build.Tasks.csproj"#
}
ItemGroup {
EmbeddedResource Update="NuGetCommand.resx" {
Generator "ResXFileCodeGenerator"
LastGenOutput "NuGetCommand.Designer.cs"
EmbeddedResource Update=NuGetCommand.resx {
Generator ResXFileCodeGenerator
LastGenOutput NuGetCommand.Designer.cs
}
Compile Update="NuGetCommand.Designer.cs" {
DesignTime true
AutoGen true
DependentUpon "NuGetCommand.resx"
Compile Update=NuGetCommand.Designer.cs {
DesignTime #true
AutoGen #true
DependentUpon NuGetCommand.resx
}
EmbeddedResource Update="NuGetResources.resx" {
EmbeddedResource Update=NuGetResources.resx {
// Strings are shared by other projects, use public strings.
Generator "PublicResXFileCodeGenerator"
LastGenOutput "NuGetResources.Designer.cs"
Generator PublicResXFileCodeGenerator
LastGenOutput NuGetResources.Designer.cs
}
Compile Update="NuGetResources.Designer.cs" {
DesignTime true
AutoGen true
DependentUpon "NuGetResources.resx"
Compile Update=NuGetResources.Designer.cs {
DesignTime #true
AutoGen #true
DependentUpon NuGetResources.resx
}
}
ItemGroup {
EmbeddedResource Include=r"$(NuGetCoreSrcDirectory)NuGet.Build.Tasks\NuGet.targets" {
Link "NuGet.targets"
SubType "Designer"
EmbeddedResource Include=#"$(NuGetCoreSrcDirectory)NuGet.Build.Tasks\NuGet.targets"# {
Link NuGet.targets
SubType Designer
}
}
// Since we are moving some code and strings from NuGet.CommandLine to NuGet.Commands, we opted to go through normal localization process (build .resources.dll) and then add them to the ILMerged nuget.exe
// This will also be called from CI build, after assemblies are localized, since our test infra takes nuget.exe before Localization
Target Name="ILMergeNuGetExe" \
AfterTargets="Build" \
Target Name=ILMergeNuGetExe \
AfterTargets=Build \
Condition="'$(BuildingInsideVisualStudio)' != 'true' and '$(SkipILMergeOfNuGetExe)' != 'true'" \
{
PropertyGroup {
@ -133,9 +133,9 @@ Project {
ExpectedLocalizedArtifactCount 0 Condition="'$(ExpectedLocalizedArtifactCount)' == ''"
}
ItemGroup {
BuildArtifacts Include=r"$(OutputPath)\*.dll" Exclude="@(MergeExclude)"
BuildArtifacts Include=#"$(OutputPath)\*.dll"# Exclude="@(MergeExclude)"
// NuGet.exe needs all NuGet.Commands.resources.dll merged in
LocalizedArtifacts Include=r"$(ArtifactsDirectory)\NuGet.Commands\**\$(NETFXTargetFramework)\**\*.resources.dll"
LocalizedArtifacts Include=#"$(ArtifactsDirectory)\NuGet.Commands\**\$(NETFXTargetFramework)\**\*.resources.dll"#
}
Error Text="Build dependencies are inconsistent with mergeinclude specified in ilmerge.props" \
Condition="'@(BuildArtifacts-&gt;Count())' != '@(MergeInclude-&gt;Count())'"
@ -143,36 +143,36 @@ Project {
Condition="'@(LocalizedArtifacts-&gt;Count())' != '$(ExpectedLocalizedArtifactCount)'"
PropertyGroup {
PathToBuiltNuGetExe "$(OutputPath)NuGet.exe"
IlmergeCommand r"$(ILMergeExePath) /lib:$(OutputPath) /out:$(ArtifactsDirectory)$(VsixOutputDirName)\NuGet.exe @(MergeAllowDup -> '/allowdup:%(Identity)', ' ') /log:$(OutputPath)IlMergeLog.txt"
IlmergeCommand #"$(ILMergeExePath) /lib:$(OutputPath) /out:$(ArtifactsDirectory)$(VsixOutputDirName)\NuGet.exe @(MergeAllowDup -> '/allowdup:%(Identity)', ' ') /log:$(OutputPath)IlMergeLog.txt"#
IlmergeCommand Condition="Exists($(MS_PFX_PATH))" "$(IlmergeCommand) /delaysign /keyfile:$(MS_PFX_PATH)"
// LocalizedArtifacts need fullpath, since there will be duplicate file names
IlmergeCommand "$(IlmergeCommand) $(PathToBuiltNuGetExe) @(BuildArtifacts->'%(filename)%(extension)', ' ') @(LocalizedArtifacts->'%(fullpath)', ' ')"
}
MakeDir Directories="$(ArtifactsDirectory)$(VsixOutputDirName)"
Exec Command="$(IlmergeCommand)" ContinueOnError="false"
Exec Command="$(IlmergeCommand)" ContinueOnError=#false
}
Import Project="$(BuildCommonDirectory)common.targets"
Import Project="$(BuildCommonDirectory)embedinterop.targets"
// Do nothing. This basically strips away the framework assemblies from the resulting nuspec.
Target Name="_GetFrameworkAssemblyReferences" DependsOnTargets="ResolveReferences"
Target Name=_GetFrameworkAssemblyReferences DependsOnTargets=ResolveReferences
Target Name="GetSigningInputs" Returns="@(DllsToSign)" {
Target Name=GetSigningInputs Returns="@(DllsToSign)" {
ItemGroup {
DllsToSign Include=r"$(ArtifactsDirectory)$(VsixOutputDirName)\NuGet.exe" {
StrongName "MsSharedLib72"
Authenticode "Microsoft400"
DllsToSign Include=#"$(ArtifactsDirectory)$(VsixOutputDirName)\NuGet.exe"# {
StrongName MsSharedLib72
Authenticode Microsoft400
}
}
}
Target Name="GetSymbolsToIndex" Returns="@(SymbolsToIndex)" {
Target Name=GetSymbolsToIndex Returns="@(SymbolsToIndex)" {
ItemGroup {
SymbolsToIndex Include=r"$(ArtifactsDirectory)$(VsixOutputDirName)\NuGet.exe"
SymbolsToIndex Include=r"$(ArtifactsDirectory)$(VsixOutputDirName)\NuGet.pdb"
SymbolsToIndex Include=#"$(ArtifactsDirectory)$(VsixOutputDirName)\NuGet.exe"#
SymbolsToIndex Include=#"$(ArtifactsDirectory)$(VsixOutputDirName)\NuGet.pdb"#
}
}
Import Project="Sdk.targets" Sdk="Microsoft.NET.Sdk"
Import Project=Sdk.targets Sdk=Microsoft.NET.Sdk
}

View File

@ -1,20 +1,20 @@
!doctype "html"
html lang="en" {
!doctype html
html lang=en {
head {
meta charset="utf-8"
meta name="viewport" content="width=device-width, initial-scale=1.0"
meta charset=utf-8
meta name=viewport content="width=device-width, initial-scale=1.0"
meta \
name="description" \
name=description \
content="kdl is a document language, mostly based on SDLang, with xml-like semantics that looks like you're invoking a bunch of CLI commands!"
title "kdl - Kat's Document Language"
link rel="stylesheet" href="/styles/global.css"
title "kdl - The KDL Document Language"
link rel=stylesheet href="/styles/global.css"
}
body {
main {
header class="py-10 bg-gray-300" {
h1 class="text-4xl text-center" "kdl - Kat's Document Language"
h1 class="text-4xl text-center" "kdl - The KDL Document Language"
}
section class="kdl-section" id="description" {
section class=kdl-section id=description {
p {
- "kdl is a document language, mostly based on "
a href="https://sdlang.org" "SDLang"
@ -22,7 +22,7 @@ html lang="en" {
}
p "It's meant to be used both as a serialization format and a configuration language, and is relatively light on syntax compared to XML."
}
section class="kdl-section" id="design-and-discussion" {
section class=kdl-section id=design-and-discussion {
h2 "Design and Discussion"
p {
- "kdl is still extremely new, and discussion about the format should happen over on the "
@ -32,11 +32,11 @@ html lang="en" {
- " page in the Github repo. Feel free to jump in and give us your 2 cents!"
}
}
section class="kdl-section" id="design-principles" {
section class=kdl-section id=design-principles {
h2 "Design Principles"
ol {
li "Maintainability"
li "Flexibility"
li Maintainability
li Flexibility
li "Cognitive simplicity and Learnability"
li "Ease of de/serialization"
li "Ease of implementation"

View File

@ -1 +1 @@
node "\"\\/\b\f\n\r\t"
node "\"\\\b\f\n\r\t "

View File

@ -1,3 +1,3 @@
node "arg" prop="val" {
node arg prop=val {
inner_node
}

View File

@ -1 +1 @@
node "arg" arg="val"
node arg arg=val

View File

@ -0,0 +1 @@
node a

View File

@ -1 +1 @@
node (type)false
node (type)#false

View File

@ -1 +1 @@
node (type)null
node (type)#null

View File

@ -1 +1 @@
node (type)"str"
node (type)str

View File

@ -1 +1 @@
node (type)"str"
node (type)str

View File

@ -1 +1 @@
node (type)true
node (type)#true

View File

@ -1 +1 @@
node (type)"arg"
node (type)arg

View File

@ -1 +1 @@
😁 "happy!"
😁 happy!

View File

@ -0,0 +1 @@
node .

View File

@ -0,0 +1 @@
node +

View File

@ -0,0 +1 @@
node +.

View File

@ -1 +1 @@
node key=("")true
node key=("")#true

View File

@ -1 +1 @@
node "arg"
node arg

View File

@ -1 +1 @@
node "arg"
node arg

View File

@ -0,0 +1 @@
node arg

View File

@ -1 +1 @@
node false true
node #false #true

View File

@ -1 +1 @@
node prop1=true prop2=false
node prop1=#true prop2=#false

View File

@ -0,0 +1 @@
foo123<bar>foo weeee

View File

@ -0,0 +1 @@
foo123,bar weeee

View File

@ -0,0 +1 @@
node (type)10

View File

@ -0,0 +1 @@
(type)node

View File

@ -0,0 +1 @@
node key=(type)10

View File

@ -0,0 +1,2 @@
node1
node2

View File

@ -0,0 +1 @@
node (type)10

View File

@ -0,0 +1 @@
(type)node

View File

@ -0,0 +1 @@
node key=(type)10

View File

@ -1 +1 @@
node "arg2"
node arg2

View File

@ -1 +1 @@
node "arg"
node arg

View File

@ -1 +1 @@
node "arg"
node arg

View File

@ -0,0 +1 @@
node --

View File

@ -1 +1 @@
node "😀"
node 😀

View File

@ -0,0 +1 @@
node

View File

@ -1 +1 @@
"" "arg"
"" arg

View File

@ -1 +1 @@
node ""="empty"
node ""=empty

View File

@ -0,0 +1 @@
node

View File

@ -0,0 +1 @@
node "Hello\n\tWorld" "Hello\n\tWorld" "Hello\n\tWorld" "Hello\n\tWorld" "Hello\n\tWorld"

View File

@ -1 +1 @@
node "arg"
node arg

View File

@ -1 +1 @@
node "arg" "arg2\n"
node arg arg2

View File

@ -0,0 +1 @@
floats #inf #-inf #nan

View File

@ -0,0 +1 @@
another-node

View File

@ -1 +1 @@
node "arg"
node arg

View File

@ -1 +1 @@
node "arg1" "arg2"
node arg1 arg2

View File

@ -0,0 +1 @@
node "hey\neveryone\nhow goes?"

View File

@ -0,0 +1 @@
node " hey\n everyone\n how goes?"

View File

@ -1 +1 @@
node " hey\neveryone\nhow goes?\n"
node "hey\neveryone\nhow goes?"

View File

@ -0,0 +1 @@
node " hey\n everyone\n how goes?"

View File

@ -1 +1 @@
node "arg"
node arg

View File

@ -1 +1 @@
node "arg"
node arg

View File

@ -1 +1 @@
node "arg"
node arg

View File

@ -1 +1 @@
node "arg"
node arg

View File

@ -1 +1 @@
node false
node #false

View File

@ -1 +1 @@
node true
node #true

View File

@ -1 +1 @@
node null
node #null

View File

@ -1 +1 @@
node prop=null
node prop=#null

View File

@ -0,0 +1,5 @@
node {
foo
bar
baz
}

View File

@ -1 +1 @@
node 1 1.0 1.0E+10 1.0E-10 1 7 2 "arg" "arg\\\\" true false null
node 1 1.0 1.0E+10 1.0E-10 1 7 2 arg arg "arg\\" #true #false #null

View File

@ -1 +1 @@
node key=(type)false
node key=(type)#false

View File

@ -0,0 +1 @@
node key=(type)str

View File

@ -1 +1 @@
node key=(type)null
node key=(type)#null

View File

@ -1 +1 @@
node key=(type)"str"
node key=(type)str

View File

@ -1 +1 @@
node key=(type)"str"
node key=(type)str

View File

@ -1 +1 @@
node key=(type)true
node key=(type)#true

View File

@ -1 +1 @@
node key=(type)true
node key=(type)#true

View File

@ -0,0 +1 @@
node ?15

View File

@ -1 +1 @@
node "0prop"="val"
node "0prop"=val

View File

@ -1 +1 @@
node key=("type/")true
node key=("type/")#true

View File

@ -1 +1 @@
r "arg"
r arg

View File

@ -1 +1 @@
node (type)true
node (type)#true

View File

@ -1 +1 @@
node key=(type)true
node key=(type)#true

View File

@ -1,3 +1,2 @@
node_1 "arg\\n"
node_2 "\"arg\\n\"and stuff"
node_3 "#\"arg\\n\"#and stuff"
node_1 "\"arg\\n\"and #stuff"
node_2 "#\"arg\\n\"#and #stuff"

View File

@ -1 +1 @@
node "\nhello\nworld\n"
node "hello\nworld"

View File

@ -1,3 +1,2 @@
node_1 prop="arg\\n"
node_2 prop="\"arg\"\\n"
node_3 prop="#\"arg\"#\\n"
node_1 prop="\"arg#\"\\n"
node_2 prop="#\"arg#\"#\\n"

View File

@ -1 +1 @@
node "arg" "arg"
node arg arg

View File

@ -1 +0,0 @@
node "whee" "whee"

View File

@ -1 +1 @@
node "arg"
node arg

View File

@ -1 +1 @@
node prop="val"
node prop=val

View File

@ -1 +1 @@
node "arg2"
node arg2

View File

@ -0,0 +1 @@
node 1 3

View File

@ -0,0 +1 @@
node 1 3

View File

@ -0,0 +1,3 @@
node foo {
three
}

View File

@ -0,0 +1 @@
node 1 2

View File

@ -0,0 +1 @@
node 1 3

View File

@ -1 +1 @@
node "arg"
node arg

View File

@ -1 +1 @@
node arg="correct"
node arg=correct

Some files were not shown because too many files have changed in this diff Show More