7.3 KiB
KDL Spec
This is the kinda-formal specification for KDL, including the intended data model and the grammar.
Introduction
KDL is a node-oriented document language. Its niche and purpose overlaps with XML, and as do many of its semantics. You can use KDL both as a configuration language, and a data exchange or storage format, if you so choose.
Components
Document
The toplevel concept of KDL is a Document. A Document is composed of zero or more Nodes, separated by newlines and whitespace, and eventually terminated by an EOF.
All KDL documents should be UTF-8 encoded and conform to the specifications in this document.
Example
The following is a document composed of two toplevel nodes:
foo {
bar
}
baz
Node
Being a node-oriented language means that the real core component of any KDL document is the "node". Every node must have a name, which is either a legal Identifier, or a quoted String.
Following the name are zero or more Arguments or Properties, separated by either whitespace or a slash-escaped line continuation. Arguments and Properties may be interspersed in any order, much like is common with positional arguments vs options in command line tools.
Arguments are ordered relative to each other and that order must be preserved in order to maintain the semantics.
By contrast, Property order SHOULD NOT matter to implementations. Children should be used if an order-sensitive key/value data structure must be represented in KDL.
Finally, a node is terminated by either a Newline, a Children
Block, a semicolon (;) or the end of the file/stream (an
EOF).
Example
foo 1 key="val" 3 {
bar
baz
}
Identifier
A bare Identifier is composed of any unicode codepoint other than non-initial characters, followed by any number of unicode codepoints other than non-identifier characters. Identifiers are terminated by Whitespace or Newlines.
Non-initial characters
The following characters cannot be the first character in a bare Identifier:
- Any of "/\{};[]=,"
- Any decimal digit (0-9)
- Any non-identifier characters
Non-identifier characters
The following characters cannot be used anywhere in a bare Identifier:
- Any codepoint with hexadecimal value
0x20or below. - Any codepoint with hexadecimal value higher than
0x10FFF. - Any of "\{};[]=,"
Line Continuation
Line continuations allow Nodes to be spread across multiple lines.
A line continuation is one or more whitespace characters,
followed by a / character. This character can then be followed by more
whitespace and must be terminated by a Newline
(including the Newline that is part of single-line comments).
Following a line continuation, processing of a Node can continue as usual.
Example
my-node 1 2 \ // this is a comment
3 4 // This is the actual end of the Node.
Value
A value is either: a String, a Raw String, a
Number, a Boolean, or null.
Property
A Property is a key/value pair attached to a Node. A Property is
composed of an Identifier or a String, followed
immediately by a =, and then a Value.
Properties should be interpreted left-to-right, with rightmost properties with identical names overriding earlier properties. That is:
node a=1 a=2
In this example, the node's a value must be 2, not 1.
No other guarantees about order should be expected by implementers. Deserialized representations may iterate over properties in any order and still be spec-compliant.
Argument
An Argument is a bare Value attached to a Node, with no associated key. It shares the same space as Properties.
A Node may have any number of Arguments, which should be evaluated left to right. KDL implementations MUST preserve the order of Arguments relative to each other (not counting Properties).
Example
my-node 1 2 3 "a" "b" "c"
Whitespace
The following characters should be treated as non-Newline white space:
| Name | Code Pt |
|---|---|
| Character Tabulation | U+0009 |
| Space | U+0020 |
| No-Break Space | U+00A0 |
| Ogham Space Mark | U+1680 |
| En Quad | U+2000 |
| Em Quad | U+2001 |
| En Space | U+2002 |
| Em Space | U+2003 |
| Three-Per-Em Space | U+2004 |
| Four-Per-Em Space | U+2005 |
| Six-Per-Em Space | U+2006 |
| Figure Space | U+2007 |
| Punctuation Space | U+2008 |
| Thin Space | U+2009 |
| Hair Space | U+200A |
| Narrow No-Break Space | U+202F |
| Medium Mathematical Space | U+205F |
| Ideographic Space | U+3000 |
Newline
The following characters should be treated as new lines:
| Acronym | Name | Code Pt |
|---|---|---|
| CR | Carriage Return | U+000D |
| LF | Line Feed | U+000A |
| CRLF | Carriage Return and Line Feed | U+000D + U+000A |
| NEL | Next Line | U+0085 |
| FF | Form Feed | U+000C |
| LS | Line Separator | U+2028 |
| PS | Paragraph Separator | U+2029 |
Note that for the purpose of new lines, CRLF is considered a single newline.
Full Grammar
// FIXME: I don't... think this is quite right?
nodes := linespace* (node (newline nodes)? linespace*)?
node := '/-'? identifier (node-space node-props-and-args)* (node-space node-document)? single-line-comment?
node-props-and-args := '/-'? prop | value
node-children := '/-'? '{' nodes '}'
node-space := ws* escline ws* | ws+
// FIXME: This needs adjustment to the new, unicode-friendly version
identifier := [a-zA-Z] [a-zA-Z0-9!$%&'*+\-./:<>?@\^_|~]* | string
prop := identifier '=' value
value := string | raw_string | number | boolean | 'null'
string := '"' character* '"'
character := '\' escape | [^\"]
escape := ["\\/bfnrt] | 'u{' hex-digit{1, 6} '}'
hex-digit := [0-9a-fA-F]
raw-string := 'r' raw-string-hash
raw-string-hash := '#' raw-string-hash '#' | raw-string-quotes
raw-string-quotes := '"' .* '"'
number := decimal | hex | octal | binary
decimal := integer ('.' [0-9]+)? exponent?
exponent := ('e' | 'E') integer
integer := sign? [0-9] [0-9_]*
sign := '+' | '-'
hex := '0x' hex-digit (hex-digit | '_')*
octal := '0o' [0-7] [0-7_]*
binary := '0b' ('0' | '1') ('0' | '1' | '_')*
boolean := 'true' | 'false'
escline := '\\' ws* (single-line-comment | newline)
linespace := newline | ws | single-line-comment
newline := See Table (All line-break white_space)
ws := bom | unicode-space | multi-line-comment
bom := '\u{FFEF}'
unicode-space := See Table (All White_Space unicode characters which are not `newline`)
single-line-comment := '//' ('\r' [^\n] | [^\r\n])* newline
multi-line-comment := '/*' ('*' [^\/] | [^*])* '*/'