mirror of https://github.com/kdl-org/kdl.git
202 lines
6.2 KiB
Markdown
202 lines
6.2 KiB
Markdown
# KDL Spec
|
|
|
|
This is the kinda-formal specification for KDL, including the intended data
|
|
model and the grammar.
|
|
|
|
## Introduction
|
|
|
|
KDL is a node-oriented document language. Its niche and purpose overlaps with
|
|
XML, and as do many of its semantics. You can use KDL both as a configuration
|
|
language, and a data exchange or storage format, if you so choose.
|
|
|
|
## Components
|
|
|
|
### Document
|
|
|
|
The toplevel concept of KDL is a Document. A Document is composed of one or more
|
|
[Nodes](#node), separated by newlines and whitespace, and eventually terminated by an EOF.
|
|
|
|
#### Example
|
|
|
|
The following is a document composed of two toplevel nodes:
|
|
|
|
```kdl
|
|
foo {
|
|
bar
|
|
}
|
|
baz
|
|
```
|
|
|
|
### Node
|
|
|
|
Being a node-oriented language means that the real core component of any KDL
|
|
document is the "node". Every node must have a name, which is either a legal
|
|
[Identifier](#identifier), or a quoted [String](#string).
|
|
|
|
Following the name are zero or more [Values](#value) or
|
|
[Properties](#property), separated by either [whitespace](#whitespace) or [a
|
|
slash-escaped line continuation](#line-continuation). Values and Properties
|
|
may be interspersed in any order, much like is common with positional
|
|
arguments vs options in command line tools.
|
|
|
|
Values are ordered relative to each other and that order must be
|
|
preserved in order to maintain the semantics.
|
|
|
|
By contrast, Property order _should not matter_ to implementations.
|
|
[Children](#children-block) should be used if an order-sensitive key/value
|
|
data structure must be represented in KDL.
|
|
|
|
Finally, a node is terminated by either a [Newline](#newline), a [Children
|
|
Block](#children-block), a semicolon (`;`) or the end of the file/stream (an
|
|
`EOF`).
|
|
|
|
#### Example
|
|
|
|
```kdl
|
|
foo 1 key="val" 3 {
|
|
bar
|
|
baz
|
|
}
|
|
```
|
|
|
|
### Identifier
|
|
|
|
A bare Identifier is composed of any unicode codepoint other than [non-initial
|
|
characters](#non-inidital-characters), followed by any number of unicode
|
|
codepoints other than [non-identifier characters](#non-identifier-characters).
|
|
Identifiers are terminated by [Whitespace](#whitespace) or
|
|
[Newlines](#newline).
|
|
|
|
### Non-initial characters
|
|
|
|
The following characters cannot be the first character in a bare
|
|
[Identifier](#identifier):
|
|
|
|
* Any of "/\\{};[]=,"
|
|
* Any decimal digit (0-9)
|
|
* Any [non-identifier characters](#non-identifier-characters)
|
|
|
|
### Non-identifier characters
|
|
|
|
The following characters cannot be used anywhere in a bare
|
|
[Identifier](#identifier):
|
|
|
|
* Any codepoint with hexadecimal value `0x20` or below.
|
|
* Any codepoint with hexadecimal value higher than `0x10FFF`.
|
|
* Any of "\\{};[]=,"
|
|
|
|
### Line Continuation
|
|
|
|
Line continuations allow [Nodes](#node) to be spread across multiple lines.
|
|
|
|
A line continuation is one or more [whitespace](#whitespace) characters,
|
|
followed by a `/` character. This character can then be followed by more
|
|
[whitespace](#whitespace) and must be terminated by a [Newline](#newline)
|
|
(including the Newline that is part of single-line comments).
|
|
|
|
Following a line continuation, processing of a Node can continue as usual.
|
|
|
|
#### Example
|
|
```kdl
|
|
my-node 1 2 \ // this is a comment
|
|
3 4 // This is the actual end of the Node.
|
|
```
|
|
|
|
### Whitespace
|
|
|
|
The following characters should be treated as non-[Newline](#newline) [white
|
|
space](https://www.unicode.org/Public/UCD/latest/ucd/PropList.txt):
|
|
|
|
| Name | Code Pt |
|
|
|----------------------|---------|
|
|
| Character Tabulation | `0009` |
|
|
| Space | `0020` |
|
|
| No-Break Space | `00A0` |
|
|
| Ogham Space Mark | `1680` |
|
|
| En Quad | `2000` |
|
|
| Em Quad | `2001` |
|
|
| En Space | `2002` |
|
|
| Em Space | `2003` |
|
|
| Three-Per-Em Space | `2004` |
|
|
| Four-Per-Em Space | `2005` |
|
|
| Six-Per-Em Space | `2006` |
|
|
| Figure Space | `2007` |
|
|
| Punctuation Space | `2008` |
|
|
| Thin Space | `2009` |
|
|
| Hair Space | `200A` |
|
|
| Narrow No-Break Space| `202F` |
|
|
| Medium Mathematical Space | `205F` |
|
|
| Ideographic Space | `3000` |
|
|
|
|
### Newline
|
|
|
|
The following characters [should be treated as new
|
|
lines](https://www.unicode.org/versions/Unicode13.0.0/ch05.pdf):
|
|
|
|
| Acronym | Name | Code Pt |
|
|
|---------|-----------------|---------|
|
|
| CR | Carriage Return | `000D` |
|
|
| LF | Line Feed | `000A` |
|
|
| CRLF | Carriage Return and Line Feed | `000D` + `000A` |
|
|
| NEL | Next Line | `0085` |
|
|
| VT | Vertical Tab | `000B` |
|
|
| FF | Form Feed | `000C` |
|
|
| LS | Line Separator | `2028` |
|
|
| PS | Paragraph Separator | `2029` |
|
|
|
|
Note that for the purpose of new lines, CRLF is considered _a single newline_.
|
|
|
|
## Full Grammar
|
|
|
|
```
|
|
// FIXME: I don't... think this is quite right?
|
|
nodes := linespace* (node (newline nodes)? linespace*)?
|
|
|
|
// FIXME: This is missing the newline at the end? And is the single-line-comment thing correct?
|
|
node := identifier (node-space node-argument)* (node-space node-document)? single-line-comment?
|
|
node-argument := prop | value
|
|
node-children := '{' nodes '}'
|
|
node-space := ws* escline ws* | ws+
|
|
|
|
// FIXME: This needs adjustment to the new, unicode-friendly version
|
|
identifier := [a-zA-Z] [a-zA-Z0-9!$%&'*+\-./:<>?@\^_|~]* | string
|
|
prop := identifier '=' value
|
|
value := string | raw_string | number | boolean | 'null'
|
|
|
|
string := '"' character* '"'
|
|
character := '\' escape | [^\"]
|
|
escape := ["\\/bfnrt] | 'u{' hex-digit{1, 6} '}'
|
|
hex-digit := [0-9a-fA-F]
|
|
|
|
raw-string := 'r' raw-string-hash
|
|
raw-string-hash := '#' raw-string-hash '#' | raw-string-quotes
|
|
raw-string-quotes := '"' .* '"'
|
|
|
|
number := decimal | hex | octal | binary
|
|
|
|
decimal := integer ('.' [0-9]+)? exponent?
|
|
exponent := ('e' | 'E') integer
|
|
integer := sign? [0-9] [0-9_]*
|
|
sign := '+' | '-'
|
|
|
|
hex := '0x' hex-digit (hex-digit | '_')*
|
|
octal := '0o' [0-7] [0-7_]*
|
|
binary := '0b' ('0' | '1') ('0' | '1' | '_')*
|
|
|
|
boolean := 'true' | 'false'
|
|
|
|
escline := '\\' ws* (single-line-comment | newline)
|
|
|
|
linespace := newline | ws | single-line-comment
|
|
|
|
newline := `000D` | `000A` | `000D` `000A` | `0085` | `000B` | `000C` | `2028` | `2029`
|
|
|
|
ws := bom | unicode-space | multi-line-comment | slashdash-comment
|
|
|
|
unicode-space := See Table (All White_Space unicode characters which are not `newline`)
|
|
|
|
single-line-comment := '//' ('\r' [^\n] | [^\r\n])* newline
|
|
multi-line-comment := '/*' ('*' [^\/] | [^*])* '*/'
|
|
slashdash-comment := '/-' (node | value | prop | node-children)
|
|
```
|