kdl/SPEC.md

5.3 KiB

KDL Spec

This is the kinda-formal specification for KDL, including the intended data model and the grammar.

Introduction

KDL is a node-oriented document language. Its niche and purpose overlaps with XML, and as do many of its semantics. You can use KDL both as a configuration language, and a data exchange or storage format, if you so choose.

Components

Document

The toplevel concept of KDL is a Document. A Document is composed of one or more Nodes, separated by newlines and whitespace, and eventually terminated by an EOF.

Example

The following is a document composed of two toplevel nodes:

foo {
    bar
}
baz

Node

Being a node-oriented language means that the real core component of any KDL document is the "node". Every node must have a name, which is either a legal Identifier, or a quoted String.

Following the name are zero or more Values or Properties, separated by either whitespace or a slash-escaped line continuation. Values and Properties may be interspersed in any order, much like is common with positional arguments vs options in command line tools.

Values are ordered relative to each other and that order must be preserved in order to maintain the semantics.

By contrast, Property order should not matter to implementations. Children should be used if an order-sensitive key/value data structure must be represented in KDL.

Finally, a node is terminated by either a Newline, a Children Block, a semicolon (;) or the end of the file/stream (an EOF).

Example

foo 1 key="val" 3 {
    bar
    baz
}

Identifier

A bare Identifier is composed of any unicode codepoint other than non-initial characters, followed by any number of unicode codepoints other than non-identifier characters. Identifiers are terminated by Whitespace or Newlines.

Non-initial characters

The following characters cannot be the first character in a bare Identifier:

Non-identifier characters

The following characters cannot be used anywhere in a bare Identifier:

  • Any codepoint with hexadecimal value 0x20 or below.
  • Any codepoint with hexadecimal value higher than 0x10FFF.
  • Any of "\{};[]=,"

Line Continuation

Line continuations allow Nodes to be spread across multiple lines.

A line continuation is one or more whitespace characters, followed by a / character. This character can then be followed by more whitespace and must be terminated by a Newline (including the Newline that is part of single-line comments).

Following a line continuation, processing of a Node can continue as usual.

Example

my-node 1 2 \  // this is a comment
        3 4    // This is the actual end of the Node.

Newline

The following characters should be treated as new lines:

Acronym Name Unicode
CR Carriage Return 000D
LF Line Feed 000A
CRLF Carriage Return and Line Feed 000D + 000A
NEL Next Line 0085
VT Vertical Tab 000B
FF Form Feed 000C
LS Line Separator 2028
PS Paragraph Separator 2029

Note that for the purpose of new lines, CRLF is considered a single newline.

Full Grammar

// FIXME: I don't... think this is quite right?
nodes := linespace* (node (newline nodes)? linespace*)?

// FIXME: This is missing the newline at the end? And is the single-line-comment thing correct?
node := identifier (node-space node-argument)* (node-space node-document)? single-line-comment?
node-argument := prop | value
node-children := '{' nodes '}'
node-space := ws* escline ws* | ws+

// FIXME: This needs adjustment to the new, unicode-friendly version
identifier := [a-zA-Z] [a-zA-Z0-9!$%&'*+\-./:<>?@\^_|~]* | string
prop := identifier '=' value
value := string | raw_string | number | boolean | 'null'

string := '"' character* '"'
character := '\' escape | [^\"]
escape := ["\\/bfnrt] | 'u{' hex-digit{1, 6} '}'
hex-digit := [0-9a-fA-F]

raw-string := 'r' raw-string-hash
raw-string-hash := '#' raw-string-hash '#' | raw-string-quotes
raw-string-quotes := '"' .* '"'

number := decimal | hex | octal | binary

decimal := integer ('.' [0-9]+)? exponent?
exponent := ('e' | 'E') integer
integer := sign? [0-9] [0-9_]*
sign := '+' | '-'

hex := '0x' hex-digit (hex-digit | '_')*
octal := '0o' [0-7] [0-7_]*
binary := '0b' ('0' | '1') ('0' | '1' | '_')*

boolean := 'true' | 'false'

escline := '\\' ws* (single-line-comment | newline)

linespace := newline | ws | single-line-comment

newline := `000D` | `000A` | `000D` `000A` | `0085` | `000B` | `000C` | `2028` | `2029`

ws := bom | ' ' | '\t' | multi-line-comment | slashdash-comment

single-line-comment := '//' ('\r' [^\n] | [^\r\n])* newline
multi-line-comment := '/*' ('*' [^\/] | [^*])* '*/'
slashdash-comment := '/-' (node | value | prop | node-children)