diff --git a/JSON-IN-KDL.md b/JSON-IN-KDL.md index 7ccf76b..9768a57 100644 --- a/JSON-IN-KDL.md +++ b/JSON-IN-KDL.md @@ -3,79 +3,112 @@ JSON-in-KDL (JiK) This specification describes a canonical way to losslessly encode [JSON](https://json.org) in [KDL](https://kdl.dev). While this isn't a very useful thing to want to do on its own, it's occasionally useful when using a KDL toolchain while speaking with a JSON-consuming or -emitting service. -This is version 2.0.0 of JiK. +This is version 4.0.0 of JiK. -JSON-in-KDL (JiK from now on) is a kdl microsyntax consisting of three types of nodes: - -* literal nodes, with `-` as the nodename -* array nodes, with `array` as the nodename -* object nodes, with `object` as the nodename +JSON-in-KDL (JiK from now on) is a kdl microsyntax consisting of named nodes that represent objects, arrays, or literal values. ---- -Literal nodes are used to represent a JSON literal, which luckily KDL's literal syntax is a superset of. They contain a single value, the literal they're representing. For example, to represent the JSON literal `true`, you'd write `- #true` in JiK. +There are two ways to write a JSON literal into JiK: -(In many cases this isn't necessary, and KDL literals can be directly used instead. Literal nodes are necessary only for a top-level literal, or to intersperse literals with arrays or objects inside an array or object node.) +* As a node with any nodename and a single argument, like `- #true` (for the JSON `true`) or `foo 5` (for the JSON `5`). +* When nested in arrays or objects, literals can usually be written as arguments (for array nodes) or properties (for object nodes). See below for details. ---- -Array nodes are used to represent a JSON array. They can contain zero or more unnamed arguments, followed by zero or more child nodes; these are taken as the items of the array, in order of appearance. +JSON arrays are represented in JiK as a node with any nodename, with zero or more arguments and/or zero or more children with `-` nodenames. -This means that simple arrays of literals can be written compactly and simply; a JSON array like `[1,2,3]` can be written in JiK as `array 1 2 3`. When an array contains nested arrays or objects, the child nodes are used; a JSON array like `[1, [true, false], 3]` can be written in JiK as: +Arguments can encode literals - for example, the JSON `[1, 2, 3]` can be written in JiK as `- 1 2 3`. + +Children can encode literals and/or nested arrays and objects. For example, the JSON `[1, [true, false], 3]` can be written in JiK as: ```kdl -array { +- { - 1 - array #true #false + - #true #false - 3 } ``` -The two methods of writing children can be mixed, pulling the prefix of the array that is just literals into the arguments of the node. The preceding example could thus also be written as: +The arguments and/or children, taken in order, represent the items of the array. + +Arguments and children can be mixed, if desired. The preceding example could also be written as: ```kdl -array 1 { - array #true #false +- 1 { + - #true #false - 3 } ``` +Two otherwise-ambiguous cases must be manually annotated with an `(array)` type annotation: + +* A single-element array (such as `[1]`) written using arguments (as `- 1`) would be ambiguous with a literal node. + To indicate this is an array, it must be written as `(array)- 1` + (Or rewritten to use child nodes, like `- { - 1 }`.) +* An empty array (JSON `[]`) must use the `(array)` type annotation, like `(array)-`. + +The `(array)` type annotation can be used on any other valid array node if desired, but has no effect in such cases. + ---- -Object nodes are used to represent a JSON object. They can contain zero or more named properties, followed by zero or more child nodes; these are taken as the key/value pairs of the object, in order of appearance. +JSON objects are represented in JiK as a node with any nodename, with zero or more properties and/or zero or more children with any nodenames. -If the value of a key/value pair is a literal, it can be encoded as a named property on the object. For example, the JSON object `{"foo": 1, "bar": true}` could be written in JiK as `object foo=1 bar=#true`. +Properties can encode literals - for example, the JSON `{"foo": 1, "bar": true}` can be written in JiK as `- foo=1 bar=#true`. -Alternately, key/value pairs can be encoded as child nodes, using a type annotation on the node name to encode the key, and the node itself as the value. The preceding example could instead have been written as: +Children can encode literals and/or nested arrays and objects, +using the nodename for the item's key. + +For example, the JSON `{"foo": 1, "bar": [2, {"baz": 3}], "qux":4}` can be written in JiK as: ```kdl -object { - (foo)- 1 - (bar)- #true -} -``` - -Of course, using children for literals is overly-verbose. It's only necessary when nesting arrays or objects into objects; for example, the JSON object `{"foo": [1, 2, {"bar": 3}], "baz":4}` can be written in JiK as: - -```kdl -object { - (foo)array 1 2 { - object bar=3 +- { + foo 1 + bar 2 { + - baz=3 } - (baz)- 4 + qux 4 } ``` -As with arrays, child nodes and properties can be mixed. The precise order of a JSON object's keys isn't *meant* to be meaningful, so as long as that's true, *all* the keys with literal values can be pulled into the argument list. The preceding example could thus also be written as: +As with arrays, child nodes and properties can be mixed, so the preceding example could have been written as: ```kdl -object baz=4 { - (foo)array 1 2 { - object bar=3 +- foo=1 { + bar 2 { + - baz=3 + } + qux 4 +} +``` + +Or, so long as the exact order of properties isn't meaningful (it's not *meant* to be in JSON), +*all* the literal-valued keys can be pulled up into properties, +leaving children nodes solely for nested arrays and objects: + +```kdl +- foo=1 qux=4 { + bar 2 { + - baz=3 } } ``` +The properties and/or children of the node represent the items of the object, +with the property names and child nodenames as each item's key. +All "keys" in an object node must be unique. + +As with arrays, there are two ambiguous cases that must be manually annoted with the `(object)` type annotation: + +* An object containing a single item whose key is "-" (like `{"-": 1}`) written using children (like `- { - 1 }`) + would be ambiguous with an array node. + To indicate this is an object, it must be written as `(object)- { - 1 }`. + (Or, if the sole item's value is a literal, as in this example, + it can be rewritten to use properties, as `- -=1`.) +* An empty object (JSON `{}`) must use the `(object)` type annotation, like `(object)-`. + +As with array nodes, `(object)` can be used on any valid object node if desired. + ---- Converting JiK back to JSON is a trivial process: literal nodes are encoded as their literal value; array nodes are encoded as their items, comma-separated and surrounded with `[]`; object nodes are encoded as their key/value pairs, comma-separated and surrounded with `{}`. @@ -84,6 +117,45 @@ Only valid JiK nodes can be encoded to JSON; if a JiK document contains an inval * A literal node is valid if it contains a single unnamed argument. -* An array node is valid if it contains only unnamed arguments and/or child nodes without type annotations on their node names. +* An array node is valid if it contains only unnamed arguments and/or child nodes named "-". If it contains no arguments and no child nodes, its nodename *must* have the `(array)` type annotation. -* An object node is valid if it contains only named properties and/or child nodes with type annotations on their node names. Additionally, all "keys" must be unique within the node, whether they're encoded as property names or type annotations on node names. +* An object node is valid if it contains only named properties and/or child nodes. Additionally, all "keys" must be unique within the node, whether they're encoded as property names or child node names. If it contains no properties and no child nodes, its nodename *must* have the `(object)` type annotation. + +---- + +Note that, outside of array/object items, the nodename is not meaningful in JiK. +For simplicity, this document uses `-` for all such nodenames +(and it is recommended that an automated JSON-to-KDL converter do the same), +but this means it is possible to write a JiK object as meaningful KDL +and embed it within a larger KDL document. + +Here's a fictitious example describing an HTTP request with a JSON body, +where the `body` node is an embedded JiK node +that nevertheless reads as fairly natural KDL. + +```kdl +request "/api/cart" method="PUT" { + body { + items { + - id=1234 amount=1 + - id=2341 amount=2 { + options { + color "red" + size "XXL" + } + } + } + } +} +``` + +The `body` node represents the JSON object + +```json +{ + "items": [ + {"id": 1234, "amount": 1}, + {"id": 2341, "amount": 2, "options": {"color": "red", "size": "XXL"}} + ] +} +``` diff --git a/QUERY-SPEC.md b/QUERY-SPEC.md index 56a2449..5fcb4ee 100644 --- a/QUERY-SPEC.md +++ b/QUERY-SPEC.md @@ -32,8 +32,8 @@ binary operators. * `top()`: Returns all toplevel children of the current document. * `top() > []`: Equivalent to `top()` on its own. -* `(foo)`: Selects any element with a tag named `foo`. -* `()`: Selects any element with any tag. +* `(foo)`: Selects any element whose type annotation is `foo`. +* `()`: Selects any element with any type annotation. * `[val()]`: Selects any element with a value. * `[val(1)]`: Selects any element with a second value. * `[prop(foo)]`: Selects any element with a property named `foo`. @@ -67,7 +67,7 @@ If the value is not a string, the matcher will always fail: The following operators work only with `val()` or `prop()` values. If the value is not one of those, the matcher will always fail: -* `[val() = (foo)]`: Selects any element whose tag is "foo". +* `[val() = (foo)]`: Selects any element whose type annotation is `foo`. ## Examples diff --git a/README.md b/README.md index d5ca1a1..624313e 100644 --- a/README.md +++ b/README.md @@ -1,9 +1,44 @@ # The KDL Document Language -KDL is a document language with xml-like semantics that looks like you're -invoking a bunch of CLI commands! It's meant to be used both as a +KDL is a small, pleasing document language with xml-like semantics that looks +like you're invoking a bunch of CLI commands! It's meant to be used both as a serialization format and a configuration language, much like JSON, YAML, or -XML. +XML. It looks like this: + +```kdl +package { + name "my-pkg" + version "1.2.3" + + dependencies { + // Nodes can have standalone values as well as + // key/value pairs. + lodash "^3.2.1" optional=true alias="underscore" + } + + scripts { + // "Raw" and multi-line strings are supported. + build r#" + echo "foo" + node -c "console.log('hello, world!');" + echo "foo" > some-file.txt + "# + } + + // `\` breaks up a single node across multiple lines. + the-matrix 1 2 3 \ + 4 5 6 \ + 7 8 9 + + // "Slashdash" comments operate at the node level, + // with just `/-`. + /-this-is-commented { + this "entire" "node" { + "is" "gone" + } + } +} +``` There's a living [specification](SPEC.md), as well as various [implementations](#implementations). You can also check out the [FAQ](#faq) to @@ -18,6 +53,8 @@ modifications and clarifications on its syntax and behavior. The current version of the KDL spec is `1.0.0`. +[Play with it in your browser!](https://kdl-play.danini.dev/) + ## Design and Discussion KDL is still extremely new, and discussion about the format should happen over @@ -32,11 +69,17 @@ free to jump in and give us your 2 cents! * Dart: [kdl-dart](https://github.com/danini-the-panini/kdl-dart) * Java: [kdl4j](https://github.com/hkolbeck/kdl4j) * PHP: [kdl-php](https://github.com/kdl-org/kdl-php) -* Python: [kdl-py](https://github.com/tabatkins/kdlpy), [cuddle](https://github.com/djmattyg007/python-cuddle) +* Python: [kdl-py](https://github.com/tabatkins/kdlpy), [cuddle](https://github.com/djmattyg007/python-cuddle), [ckdl](https://github.com/tjol/ckdl) * Elixir: [kuddle](https://github.com/IceDragon200/kuddle) * XSLT: [xml2kdl](https://github.com/Devasta/XML2KDL) * Haskell: [Hustle](https://github.com/fuzzypixelz/Hustle) * .NET: [Kadlet](https://github.com/oledfish/Kadlet) +* C: [ckdl](https://github.com/tjol/ckdl) +* C++: [kdlpp](https://github.com/tjol/ckdl) (part of ckdl, requires C++20) +* OCaml: [ocaml-kdl](https://github.com/Bannerets/ocaml-kdl) +* Nim: [kdl-nim](https://github.com/Patitotective/kdl-nim) +* Common Lisp: [kdlcl](https://github.com/chee/kdlcl) +* Go: [gokdl](https://github.com/lunjon/gokdl), [kdl-go](https://github.com/sblinch/kdl-go) ## Compatibility Test Suite @@ -49,6 +92,9 @@ entirety, but in the future, may be required to in order to be included here. ## Editor Support * [VS Code](https://marketplace.visualstudio.com/items?itemName=kdl-org.kdl&ssr=false#review-details) +* [Sublime Text](https://packagecontrol.io/packages/KDL) +* [vim](https://github.com/imsnif/kdl.vim) +* [Intellij IDEA](https://plugins.jetbrains.com/plugin/20136-kdl-document-language) ## Overview diff --git a/SPEC.md b/SPEC.md index 518b236..12d7a2d 100644 --- a/SPEC.md +++ b/SPEC.md @@ -49,8 +49,8 @@ baz ### Node Being a node-oriented language means that the real core component of any KDL -document is the "node". Every node must have a name, which is either a legal -[Identifier](#identifier), or a quoted [String](#string). +document is the "node". Every node must have a name, which is an +[Identifier](#identifier). The name may be preceded by a [Type Annotation](#type-annotation) to further clarify its type, particularly in relation to its parent node. (For example, @@ -92,7 +92,14 @@ foo 1 key="val" 3 { ### Identifier -A bare Identifier is composed of any [Unicode Scalar +An Identifier is either a [Bare Identifier](#bare-identifier), which is an +unquoted string like `node` or `item`, a [String](#string), or a [Raw String](#raw-string). +There's no semantic difference between the kinds of identifier; this simply allows +for the use of quotes to have unusual identifiers that are inexpressible as bare identifiers. + +### Bare Identifier + +A Bare Identifier is composed of any [Unicode Scalar Value](https://unicode.org/glossary/#unicode_scalar_value) other than [non-initial characters](#non-initial-characters), followed by any number of Unicode Scalar Values other than [non-identifier @@ -106,20 +113,16 @@ When Identifiers are used as the values in [Arguments](#argument) and [Properties](#property), they are treated as strings, just like they are with node names and property keys. -Identifiers are terminated by [Whitespace](#whitespace) or +Bare Identifiers are terminated by [Whitespace](#whitespace) or [Newlines](#newline). -In all places where Identifiers are used, [Strings](#string) and [Raw -Strings](#raw-string) can be used in the same place, without an Identifier's -character restrictions. - -The literal identifiers `true`, `false`, and `null` are illegal identifiers, +The literal identifiers `true`, `false`, and `null` are illegal Bare Identifiers, and _MUST_ be treated as a syntax error. ### Non-initial characters -The following characters cannot be the first character in a bare -[Identifier](#identifier): +The following characters cannot be the first character in a +[Bare Identifier](#identifier): * Any decimal digit (0-9) * Any [non-identifier characters](#non-identifier-characters) @@ -131,8 +134,7 @@ negative number. ### Non-identifier characters -The following characters cannot be used anywhere in a bare -[Identifier](#identifier): +The following characters cannot be used anywhere in a [Bare Identifier](#identifier): * Any of `(){}[]/\="#;` * Any [Whitespace](#whitespace) or [Newline](#newline). @@ -160,8 +162,7 @@ my-node 1 2 \ // comments are ok after \ ### Property A Property is a key/value pair attached to a [Node](#node). A Property is -composed of an [Identifier](#identifier) or a [String](#string), followed -immediately by a `=`, and then a [Value](#value). +composed of an [Identifier](#identifier), followed immediately by a `=`, and then a [Value](#value). Properties should be interpreted left-to-right, with rightmost properties with identical names overriding earlier properties. That is: @@ -182,7 +183,7 @@ make it act as plain whitespace, even if it spreads across multiple lines. ### Argument An Argument is a bare [Value](#value) attached to a [Node](#node), with no -associated key. It shares the same space as [Properties](#properties). +associated key. It shares the same space as [Properties](#properties), and may be interleaved with them. A Node may have any number of Arguments, which should be evaluated left to right. KDL implementations _MUST_ preserve the order of Arguments relative to @@ -219,14 +220,14 @@ parent { child1; child2; } ### Value -A value is either: an [Identifier](#identifier), a [String](#string), a [Raw -String](#raw-string), a [Number](#number), a [Boolean](#boolean), or -[Null](#null) +A value is either: an [Identifier](#identifier), a [String](#string), a +[Number](#number), a [Boolean](#boolean), or [Null](#null). Values _MUST_ be either [Arguments](#argument) or values of [Properties](#property). -Values _MAY_ be prefixed by a single [Type Annotation](#type-annotation). +Values (both as arguments and as properties) _MAY_ be prefixed by a single +[Type Annotation](#type-annotation). ### Type Annotation @@ -236,9 +237,8 @@ or as a _context-specific elaboration_ of the more generic type the node name indicates. Type annotations are written as a set of `(` and `)` with a single -[Identifier](#identifier) in it. Any valid identifier or string is considered -a valid type annotation. There must be no whitespace between a type annotation -and its associated Node Name or Value. +[Identifier](#identifier) in it. It may contain Whitespace after the `(` and before +the `)`, and may be separated from its target by Whitespace. KDL does not specify any restrictions on what implementations might do with these annotations. They are free to ignore them, or use them to make decisions @@ -317,9 +317,10 @@ node prop=(regex).* ### String -Strings in KDL represent textual [Values](#value). They are delimited by `"` -on either side of any number of literal string characters except unescaped -`"` and `\`. +Strings in KDL represent textual [Values](#value), or unusual identifiers. A +String is either a [Quoted String](#quoted-string) or a +[Raw String](#raw-string). Quoted Strings may include escaped characters, while +Raw Strings always contain only the literal characters that are present. Strings _MUST_ be represented as UTF-8 values. @@ -327,7 +328,7 @@ Strings _MUST NOT_ include the code points for [disallowed literal code points](#disallowed-literal-code-points) directly. If needed, they can be specified with their corresponding `\u{}` escape. -#### Multi-line Strings +### Multi-line Strings Strings may span multiple lines with literal Newlines, in which case the resulting String is "dedented" according to the line with the fewest number of @@ -351,6 +352,19 @@ are stripped to only contain the single Newline character. Strings with literal Newlines that do not immediately start with a Newline and whose final `"` is not preceeded by whitespace and a Newline are illegal. +### Quoted String + +A Quoted String is delimited by `"` on either side of any number of literal +string characters except unescaped `"` and `\`. This includes literal +[Newline](#newline) characters, which means a String Value can encompass +multiple lines without behaving like a Newline for [Node](#node) parsing +purposes. + +Like Strings, Quoted Strings _MUST NOT_ include any of the [disallowed literal +code-points](#disallowed-literal-code-points) as code points in their body. + +Quoted Strings also follow the Multi-line rules specified in [String](#string). + #### Escapes In addition to literal code points, a number of "escapes" are supported. @@ -401,10 +415,10 @@ other characters in a string. ### Raw String -Raw Strings in KDL are much like [Strings](#string), except they do not -support `\`-escapes. They otherwise share the same properties as far as -literal [Newline](#newline) characters go, and the requirement of UTF-8 -representation. +Raw Strings in KDL are much like [Quoted Strings](#quoted-string), except they +do not support `\`-escapes. They otherwise share the same properties as far as +literal [Newline](#newline) characters go, multi-line rules, and the requirement +of UTF-8 representation. Raw String literals are represented with one or more `#` characters, followed by `"`, followed by any number of UTF-8 literals. The string is then closed by @@ -417,35 +431,6 @@ code-points](#disallowed-literal-code-points) as code points in their body. Unlike with Strings, these cannot simply be escaped, and are thus unrepresentable when using Raw Strings. -Like Strings, Raw Strings _MUST NOT_ include any of the [disallowed literal -code-points](#disallowed-literal-code-points) as code points in their body. -Unlike with Strings, these cannot simply be escaped, and are thus -unrepresentable when using Raw Strings. - -#### Multi-line Raw Strings - -Raw Strings may span multiple lines with literal newlines, in which case the -resulting string is "dedented" according to the line with the fewest number of -Whitespace characters preceding its first non-Whitespace character. That is, -the number of Whitespace characters in the least-indented line in the Raw -String body is subtracted from the Whitespace of all other lines. - -Multi-line strings _MUST_ have a single [Newline](#newline) immediately -following their opening `#"`, after which they may have any number of newlines. -Finally, there must be a Newline, followed by any number of Whitespace, before -the closing `"#`. - -The first Newline, the last Newline, along with Whitespace following the last -Newline, are not included in the value of the Raw String. The first and last -Newline can be the same character (that is, empty multi-line strings are -legal). - -Furthermore, any lines in the Raw String body that only contain literal -whitespace are stripped to only contain the single Newline character. - -Raw Strings with literal Newlines that do not immediately start with a Newline -and whose final `"#` is not preceeded by whitespace and a Newline are illegal. - #### Example ```kdl @@ -469,10 +454,9 @@ This is the base indentation ### Number -Numbers in KDL represent numerical [Values](#value). There is no logical -distinction in KDL between real numbers, integers, and floating point numbers. -It's up to individual implementations to determine how to represent KDL -numbers. +Numbers in KDL represent numerical [Values](#value). There is no logical distinction in KDL +between real numbers, integers, and floating point numbers. It's up to +individual implementations to determine how to represent KDL numbers. There are four syntaxes for Numbers: Decimal, Hexadecimal, Octal, and Binary. @@ -622,7 +606,7 @@ raw-string-quotes := '"' (single-line-raw-string-body | newline multi-line-raw-s single-line-raw-string-body := (unicode - newline - disallowed-literal-code-points)* multi-line-raw-string-body := (unicode - disallowed-literal-code-points)* -number := decimal | hex | octal | binary +number := hex | octal | binary | decimal decimal := sign? integer ('.' integer)? exponent? exponent := ('e' | 'E') sign? integer diff --git a/XML-IN-KDL.md b/XML-IN-KDL.md index 32ce487..924680f 100644 --- a/XML-IN-KDL.md +++ b/XML-IN-KDL.md @@ -7,7 +7,7 @@ This is version 1.0.0 of XiK. XML-in-KDL (XiK from now on) is a KDL microsyntax for losslessly encoding XML into a KDL document. XML and KDL, luckily, have *very similar* data models (KDL is *almost* a superset of XML), so it's quite straightforward to encode most XML documents into KDL. -See [the website example](examples/website.kdl) for an example of this grammar in use to encode an HTML document. +See [the website example](examples/website.kdl) for an example of this grammar in use to encode an HTML document. See [XML2KDL](https://github.com/Devasta/XML2KDL) (third party) to encode your XML in KDL (especially [their online editor](https://xsltfiddle.liberty-development.net/bET2rY5)). XML has several types of nodes, corresponding to certain KDL constructs: diff --git a/tests/README.md b/tests/README.md index 7c5fa5e..0ddfea0 100644 --- a/tests/README.md +++ b/tests/README.md @@ -52,10 +52,3 @@ please send a PR. If you think the disagreement is due to a genuine error or oversight in the KDL specification, please open an issue explaining the matter and the change will be considered for the next version of the KDL spec. - -## Credit - -This test suite was extracted from -[`kdl4j`](https://github.com/hkolbeck/kdl4j), the original Java -implementation of KDL, with huge thanks to -[@hkolbeck](https://github.com/hkolbeck) for authoring them!