mirror of https://github.com/kdl-org/kdl.git
142 lines
4.1 KiB
Markdown
142 lines
4.1 KiB
Markdown
# KDL Spec
|
|
|
|
This is the kinda-formal specification for KDL, including the intended data
|
|
model and the grammar.
|
|
|
|
## Introduction
|
|
|
|
KDL is a node-oriented document language. Its niche and purpose overlaps with
|
|
XML, and as do many of its semantics. You can use KDL both as a configuration
|
|
language, and a data exchange or storage format, if you so choose.
|
|
|
|
## Components
|
|
|
|
### Document
|
|
|
|
The toplevel concept of KDL is a Document. A Document is composed of one or more
|
|
[Nodes](#node), separated by newlines and whitespace, and eventually terminated by an EOF.
|
|
|
|
#### Example
|
|
|
|
The following is a document composed of two toplevel nodes:
|
|
|
|
```kdl
|
|
foo {
|
|
bar
|
|
}
|
|
baz
|
|
```
|
|
|
|
### Node
|
|
|
|
Being a node-oriented language means that the real core component of any KDL
|
|
document is the "node". Every node must have a name, which is either a legal
|
|
[Identifier](#identifier), or a quoted [String](#string).
|
|
|
|
Following the name are one or more [Whitespace](#whitespace) components,
|
|
followed by zero or more whitespace-separated [Values](#value) or
|
|
[Properties](#property). Finally, a node is terminated by either a
|
|
[Newline](#newline), a [Children Block](#children-block), a semicolon (`;`) or
|
|
the end of the
|
|
file/stream (an `EOF`).
|
|
|
|
When present in the list of Properties and Values, plain Values (those not
|
|
attached to a Property), each "anonymous" value should be treated as a
|
|
Property whose key is its current index among _values_ in the same
|
|
node, starting from 0, as a string. Named properties do not count towarrds
|
|
this index.
|
|
|
|
That is, the following two nodes are semantically equivalent:
|
|
|
|
```kdl
|
|
foo 1 key="val" 2
|
|
foo "0"=1 "1"=2 key="val"
|
|
```
|
|
|
|
#### Example
|
|
|
|
```kdl
|
|
foo 1 key="val" 3 {
|
|
bar
|
|
baz
|
|
}
|
|
```
|
|
|
|
### Identifier
|
|
|
|
A bare Identifier is composed of any unicode codepoint other than [non-initial
|
|
characters](#non-inidital-characters), followed by any number of unicode
|
|
codepoints other than [non-identifier characters](#non-identifier-characters).
|
|
Identifiers are terminated by [Whitespace](#whitespace) or
|
|
[Newlines](#newline).
|
|
|
|
### Non-initial characters
|
|
|
|
The following characters cannot be the first character in a bare
|
|
[Identifier](#identifier):
|
|
|
|
* Any of "/\\{};[]=,"
|
|
* Any decimal digit (0-9)
|
|
* Any [non-identifier characters](#non-identifier-characters)
|
|
|
|
### Non-identifier characters
|
|
|
|
The following characters cannot be used anywhere in a bare [Identifier](#identifier):
|
|
|
|
* Any codepoint with hexadecimal value `0x20` or below.
|
|
* Any codepoint with hexadecimal value higher than `0x10FFF`.
|
|
* Any of "\\{};[]=,"
|
|
|
|
## Full Grammar
|
|
|
|
```
|
|
// FIXME: I don't... think this is quite right?
|
|
nodes := linespace* (node (newline nodes)? linespace*)?
|
|
|
|
// FIXME: This is missing the newline at the end? And is the single-line-comment thing correct?
|
|
node := identifier (node-space node-argument)* (node-space node-document)? single-line-comment?
|
|
node-argument := prop | value
|
|
node-children := '{' nodes '}'
|
|
node-space := ws* escline ws* | ws+
|
|
|
|
// FIXME: This needs adjustment to the new, unicode-friendly version
|
|
identifier := [a-zA-Z] [a-zA-Z0-9!$%&'*+\-./:<>?@\^_|~]* | string
|
|
prop := identifier '=' value
|
|
value := string | raw_string | number | boolean | 'null'
|
|
|
|
string := '"' character* '"'
|
|
character := '\' escape | [^\"]
|
|
escape := ["\\/bfnrt] | 'u{' hex-digit{1, 6} '}'
|
|
hex-digit := [0-9a-fA-F]
|
|
|
|
raw-string := 'r' raw-string-hash
|
|
raw-string-hash := '#' raw-string-hash '#' | raw-string-quotes
|
|
raw-string-quotes := '"' .* '"'
|
|
|
|
number := decimal | hex | octal | binary
|
|
|
|
decimal := integer ('.' [0-9]+)? exponent?
|
|
exponent := ('e' | 'E') integer
|
|
integer := sign? [0-9] [0-9_]*
|
|
sign := '+' | '-'
|
|
|
|
hex := '0x' hex-digit (hex-digit | '_')*
|
|
octal := '0o' [0-7] [0-7_]*
|
|
binary := '0b' ('0' | '1') ('0' | '1' | '_')*
|
|
|
|
boolean := 'true' | 'false'
|
|
|
|
escline := '\\' ws* (single-line-comment | newline)
|
|
|
|
linespace := newline | ws | single-line-comment
|
|
|
|
// FIXME: This needs to support all unicode newline chars. See #27
|
|
newline := ('\r' '\n') | '\n'
|
|
|
|
ws := bom | ' ' | '\t' | multi-line-comment | slashdash-comment
|
|
|
|
single-line-comment := '//' ('\r' [^\n] | [^\r\n])* newline
|
|
multi-line-comment := '/*' ('*' [^\/] | [^*])* '*/'
|
|
slashdash-comment := '/-' (node | value | prop | node-children)
|
|
```
|