4.1 KiB
KDL Spec
This is the kinda-formal specification for KDL, including the intended data model and the grammar.
Introduction
KDL is a node-oriented document language. Its niche and purpose overlaps with XML, and as do many of its semantics. You can use KDL both as a configuration language, and a data exchange or storage format, if you so choose.
Components
Document
The toplevel concept of KDL is a Document. A Document is composed of one or more Nodes, separated by newlines and whitespace, and eventually terminated by an EOF.
Example
The following is a document composed of two toplevel nodes:
foo {
bar
}
baz
Node
Being a node-oriented language means that the real core component of any KDL document is the "node". Every node must have a name, which is either a legal Identifier, or a quoted String.
Following the name are one or more Whitespace components,
followed by zero or more whitespace-separated Values or
Properties. Finally, a node is terminated by either a
Newline, a Children Block, a semicolon (;) or
the end of the
file/stream (an EOF).
When present in the list of Properties and Values, plain Values (those not attached to a Property), each "anonymous" value should be treated as a Property whose key is its current index among anonymous values in the same node, starting from 0, as a string. Named properties do not count towarrds this index.
That is, the following two nodes are semantically equivalent:
foo 1 key="val" 2
foo "0"=1 "1"=2 key="val"
Example
foo 1 key="val" 3 {
bar
baz
}
Identifier
A bare Identifier is composed of any unicode codepoint other than non-initial characters, followed by any number of unicode codepoints other than non-identifier characters. Identifiers are terminated by Whitespace or Newlines.
Non-initial characters
The following characters cannot be the first character in a bare Identifier:
- Any of "/\{};[]=,"
- Any decimal digit (0-9)
- Any non-identifier characters
Non-identifier characters
The following characters cannot be used anywhere in a bare Identifier:
- Any codepoint with hexadecimal value
0x20or below. - Any codepoint with hexadecimal value higher than
0x10FFF. - Any of "\{};[]=,"
Full Grammar
// FIXME: I don't... think this is quite right?
nodes := linespace* (node (newline nodes)? linespace*)?
// FIXME: This is missing the newline at the end? And is the single-line-comment thing correct?
node := identifier (node-space node-argument)* (node-space node-document)? single-line-comment?
node-argument := prop | value
node-children := '{' nodes '}'
node-space := ws* escline ws* | ws+
// FIXME: This needs adjustment to the new, unicode-friendly version
identifier := [a-zA-Z] [a-zA-Z0-9!$%&'*+\-./:<>?@\^_|~]* | string
prop := identifier '=' value
value := string | raw_string | number | boolean | 'null'
string := '"' character* '"'
character := '\' escape | [^\"]
escape := ["\\/bfnrt] | 'u{' hex-digit{1, 6} '}'
hex-digit := [0-9a-fA-F]
raw-string := 'r' raw-string-hash
raw-string-hash := '#' raw-string-hash '#' | raw-string-quotes
raw-string-quotes := '"' .* '"'
number := decimal | hex | octal | binary
decimal := integer ('.' [0-9]+)? exponent?
exponent := ('e' | 'E') integer
integer := sign? [0-9] [0-9_]*
sign := '+' | '-'
hex := '0x' hex-digit (hex-digit | '_')*
octal := '0o' [0-7] [0-7_]*
binary := '0b' ('0' | '1') ('0' | '1' | '_')*
boolean := 'true' | 'false'
escline := '\\' ws* (single-line-comment | newline)
linespace := newline | ws | single-line-comment
// FIXME: This needs to support all unicode newline chars. See #27
newline := ('\r' '\n') | '\n'
ws := bom | ' ' | '\t' | multi-line-comment | slashdash-comment
single-line-comment := '//' ('\r' [^\n] | [^\r\n])* newline
multi-line-comment := '/*' ('*' [^\/] | [^*])* '*/'
slashdash-comment := '/-' (node | value | prop | node-children)