From 86571d4c0ef960e9debfcbe00ca871f7a7a73704 Mon Sep 17 00:00:00 2001
From: zkat
For more details, see the overview below. Play with it in your browser!
There's a living -specification, as well as +specification, as well as various implementations. You can also check out the FAQ to answer all your burning questions!
The current version of the KDL spec is -KDL 2.0.0. For legacy KDL, -please refer to the KDL 1.0.0 +KDL 2.0.0. For legacy KDL, +please refer to the KDL 1.0.0 spec. All users are -encouraged to migrate. Migration is forward-and-backward-compatible and +encouraged to migrate. Migration is forward-and-backward-compatible and safe, and can be automated.
In addition to a spec for KDL itself, there are specifications for a KDL Query
diff --git a/spec-v1/index.html b/spec-v1/index.html
new file mode 100644
index 0000000..82fd600
--- /dev/null
+++ b/spec-v1/index.html
@@ -0,0 +1,594 @@
+
+
+
+
+
+
+ This is the semi-formal specification for the legacy version of KDL, including
+the intended data model and the grammar. This document describes KDL version Information in this spec is intended as both an accessible historical record,
+and a reference for KDL implementors who are interested in supporting both major
+versions of the language. The v1 spec will not receive further updates outside of minor, inconsequential
+rewordings or other superficial fixes and is considered a "legacy" version. KDL v2 is designed such that for any given KDL document in either v1 or v2, the
+parse will either fail completely, or, if the parse succeeds, the data
+represented by a v1 or v2 parser will be identical. This means that it's safe to
+use a fallback parsing strategy in order to support both v1 and v2
+simultaneously. For example, KDL v2 is designed such that for any given KDL document written as KDL
+1.0 or KDL 2.0,
+the parse will either fail completely, or, if the
+parse succeeds, the data represented by a v1 or v2 parser will be identical.
+This means that it's safe to use a fallback parsing strategy in order to support
+both v1 and v2 simultaneously. For example, A version marker KDL is a node-oriented document language. Its niche and purpose overlaps with
+XML, and as do many of its semantics. You can use KDL both as a configuration
+language, and a data exchange or storage format, if you so choose. The bulk of this document is dedicated to a long-form description of all
+Components of a KDL document. There is also a much more terse
+Grammar at the end of the document that covers most of the
+rules, with some semantic exceptions involving the data model. KDL is designed to be easy to read and easy to implement. In this document, references to "left" or "right" refer to directions in the
+data stream towards the beginning or end, respectively; in other words,
+the directions if the data stream were only ASCII text. They do not refer
+to the writing direction of text, which can flow in either direction,
+depending on the characters used. The toplevel concept of KDL is a Document. A Document is composed of zero or
+more Nodes, separated by newlines and whitespace, and eventually
+terminated by an EOF. All KDL documents should be UTF-8 encoded and conform to the specifications in
+this document. The following is a document composed of two toplevel nodes: Being a node-oriented language means that the real core component of any KDL
+document is the "node". Every node must have a name, which is an
+Identifier. The name may be preceded by a Type Annotation to further
+clarify its type, particularly in relation to its parent node. (For example,
+clarifying that a particular Following the name are zero or more Arguments or
+Properties, separated by either whitespace or a
+slash-escaped line continuation. Arguments and Properties
+may be interspersed in any order, much like is common with positional
+arguments vs options in command line tools. Children can be placed after the name and the optional
+Arguments and Properties, possibly separated by either whitespace or a
+slash-escaped line continuation. Arguments are ordered relative to each other (but not relative to Properties)
+and that order must be preserved in order to maintain the semantics. By contrast, Property order SHOULD NOT matter to implementations.
+Children should be used if an order-sensitive key/value
+data structure must be represented in KDL. Nodes MAY be prefixed with Finally, a node is terminated by either a Newline, a semicolon ( An Identifier is either a Bare Identifier, which is an
+unquoted string like A Bare Identifier is composed of any Unicode codepoint other than non-initial
+characters, followed by any number of Unicode
+codepoints other than non-identifier characters,
+so long as this doesn't produce something confusable for a Number,
+Boolean, or Null. For example, both a Number
+and an Identifier can start with Identifiers are terminated by Whitespace or
+Newlines. The following characters cannot be the first character in a
+Bare Identifier: Be aware that the The following characters cannot be used anywhere in a Bare Identifier: Line continuations allow Nodes to be spread across multiple lines. A line continuation is a Following a line continuation, processing of a Node can continue as usual. A Property is a key/value pair attached to a Node. A Property is
+composed of an Identifier, followed immediately by a Properties should be interpreted left-to-right, with rightmost properties with
+identical names overriding earlier properties. That is: In this example, the node's No other guarantees about order should be expected by implementers.
+Deserialized representations may iterate over properties in any order and
+still be spec-compliant. Properties MAY be prefixed with An Argument is a bare Value attached to a Node, with no
+associated key. It shares the same space as Properties, and may be interleaved with them. A Node may have any number of Arguments, which should be evaluated left to
+right. KDL implementations MUST preserve the order of Arguments relative to
+each other (not counting Properties). Arguments MAY be prefixed with A children block is a block of Nodes, surrounded by Regular node termination rules apply, which means multiple nodes can be
+included in a single-line children block, as long as they're all terminated by
+ A value is either: a String, a Number, a
+Boolean, or Null. Values MUST be either Arguments or values of
+Properties. Values (both as arguments and as properties) MAY be prefixed by a single
+Type Annotation. A type annotation is a prefix to any Node Name or Value that
+includes a suggestion of what type the value is intended to be treated as,
+or as a context-specific elaboration of the more generic type the node name
+indicates. Type annotations are written as a set of KDL does not specify any restrictions on what implementations might do with
+these annotations. They are free to ignore them, or use them to make decisions
+about how to interpret a value. Additionally, the following type annotations MAY be recognized by KDL parsers
+and, if used, SHOULD interpret these types as follows: Signed integers of various sizes (the number is the bit size): Unsigned integers of various sizes (the number is the bit size): Platform-dependent integer types, both signed and unsigned: IEEE 754 floating point numbers, both single (32) and double (64) precision: IEEE 754-2008 decimal floating point numbers Strings in KDL represent textual Values, or unusual identifiers. A
+String is either a Quoted String or a
+Raw String. Quoted Strings may include escaped characters, while
+Raw Strings always contain only the literal characters that are present. A Quoted String is delimited by Strings MUST be represented as UTF-8 values. In addition to literal code points, a number of "escapes" are supported.
+"Escapes" are the character Raw Strings in KDL are much like Quoted Strings, except they
+do not support Raw String literals are represented as Numbers in KDL represent numerical Values. There is no logical distinction in KDL
+between real numbers, integers, and floating point numbers. It's up to
+individual implementations to determine how to represent KDL numbers. There are four syntaxes for Numbers: Decimal, Hexadecimal, Octal, and Binary. A boolean Value is either the symbol The symbol The following characters should be treated as non-Newline white
+space: In addition to single-line comments using The following characters should be treated as new
+lines: Note that for the purpose of new lines, CRLF is considered a single newline. KDL v1 Spec
+1.0.0. It was released on September 11, 2021.Compatibility
+node "foo" is a valid node in both versions, and
+should be represented identically by parsers.node "foo" is a valid node in both
+versions, and should be represented identically by parsers./- kdl-version 1 (or 2) MAY be added to the beginning of
+a KDL document, optionally preceded by the BOM, and parsers MAY use that as a
+hint as to which version to parse the document as.Introduction
+Components
+Document
+Example
+
+foo {
+ bar
+}
+baz
+Node
+date child node is for the publication date,
+rather than the last-modified date, with (published)date.)/- to "comment out" the entire node, including
+its properties, arguments, and children, and make it act as plain whitespace,
+even if it spreads across multiple lines.;)
+or the end of the file/stream (an EOF).Example
+
+foo 1 key="val" 3 {
+ bar
+ (role)baz 1 2
+}
+Identifier
+node or item, or a String, which is quoted,
+like "node" or "two words". There's no semantic difference between the
+kinds of identifier; this simply allows for the use of quotes to have unusual
+identifiers that are inexpressible as bare identifiers.Bare Identifier
+-, but when an Identifier starts with -
+the second character cannot be a digit. This is precisely specified in the
+Full Grammar below.Non-initial characters
+
+
+- character can only be used as an initial
+character if the second character is not a digit. This allows
+identifiers to look like --this, and removes the ambiguity
+of having an identifier look like a negative number.Non-identifier characters
+
+
+0x20 or below.0x10FFFF.\/(){}<>;[]=,"Line Continuation
+\ character followed by zero or more whitespace
+characters and an optional single-line comment. It must be terminated by a
+Newline (including the Newline that is part of single-line comments).Example
+
+my-node 1 2 \ // comments are ok after \
+ 3 4 // This is the actual end of the Node.
+Property
+=, and then a Value.
+node a=1 a=2
+a value must be 2, not 1./- to "comment out" the entire token and
+make it act as plain whitespace, even if it spreads across multiple lines.Argument
+/- to "comment out" the entire token and
+make it act as plain whitespace, even if it spreads across multiple lines.Example
+
+my-node 1 2 3 "a" "b" "c"
+Children Block
+{ and }. They
+are an optional part of nodes, and create a hierarchy of KDL nodes.;.Example
+
+parent {
+ child1
+ child2
+}
+
+parent { child1; child2; }
+Value
+Type Annotation
+( and ) with an
+Identifier in it. Any valid identifier is considered a valid
+type annotation. There must be no whitespace between a type annotation and its
+associated Node Name or Value.Reserved Type Annotations for Numbers Without Decimals:
+
+
+i8i16i32i64
+
+u8u16u32u64
+
+isizeusizeReserved Type Annotations for Numbers With Decimals:
+
+
+f32f64
+
+decimal64decimal128Reserved Type Annotations for Strings:
+
+
+date-time: ISO8601 date/time format.time: "Time" section of ISO8601.date: "Date" section of ISO8601.duration: ISO8601 duration format.decimal: IEEE 754-2008 decimal string format.currency: ISO 4217 currency code.country-2: ISO 3166-1 alpha-2 country code.country-3: ISO 3166-1 alpha-3 country code.country-subdivision: ISO 3166-2 country subdivision code.email: RFC5322 email address.idn-email: RFC6531 internationalized email address.hostname: RFC1123 internet hostname (only ASCII segments)idn-hostname: RFC5890 internationalized internet hostname (only xn---prefixed ASCII "punycode" segments, or non-ASCII segments)ipv4: RFC2673 dotted-quad IPv4 address.ipv6: RFC2373 IPv6 address.url: RFC3986 URI.url-reference: RFC3986 URI Reference.irl: RFC3987 Internationalized Resource Identifier.irl-reference: RFC3987 Internationalized Resource Identifier Reference.url-template: RFC6570 URI Template.uuid: RFC4122 UUID.regex: Regular expression. Specific patterns may be implementation-dependent.base64: A Base64-encoded string, denoting arbitrary binary data.Examples
+
+node (u8)123
+node prop=(regex)".*"
+(published)date "1970-01-01"
+(contributor)person name="Foo McBar"
+String
+Quoted String
+" on either side of any number of literal
+string characters except unescaped " and \. This includes literal
+Newline characters, which means a String Value can encompass
+multiple lines without behaving like a Newline for Node parsing
+purposes.\ followed by another character, and are
+interpreted as described in the following table:
+
+
+
+
+
+
+Name
+Escape
+Code Pt
+
+
+Line Feed
+
+\n
+U+000A
+
+Carriage Return
+
+\r
+U+000D
+
+Character Tabulation (Tab)
+
+\t
+U+0009
+
+Reverse Solidus (Backslash)
+
+\\
+U+005C
+
+Solidus (Forwardslash)
+
+\/
+U+002F
+
+Quotation Mark (Double Quote)
+
+\"
+U+0022
+
+Backspace
+
+\b
+U+0008
+
+Form Feed
+
+\f
+U+000C
+
+
+Unicode Escape
+
+\u{(1-6 hex chars)}Code point described by hex characters, up to
+10FFFFRaw String
+\-escapes. They otherwise share the same properties as far as
+literal Newline characters go, and the requirement of UTF-8
+representation.r, followed by zero or more #
+characters, followed by ", followed by any number of UTF-8 literals. The
+string is then closed by a " followed by a matching number of #
+characters. This allows them to contain raw " or # characters; only the
+precise terminator (resembling "##, for example) ends the raw string. This
+means that the string sequence " or "# and such must not match the closing
+" with the same or more # characters as the opening r.Example
+
+just-escapes r"\n will be literal"
+quotes-and-escapes r#"hello\n\r\asd"world"#
+Number
+
+
+- or +, which determine whether they'll be positive or negative.0b and only allow 0 and 1 as digits, which may be separated by _. They represent numbers in radix 2.0o and only allow digits between 0 and 7, which may be separated by _. They represent numbers in radix 8.0x and allow digits between 0 and 9, as well as letters A through F, in either lower or upper case, which may be separated by _. They represent numbers in radix 16.
+
+0 through 9, which may be separated by _.., followed by more digits, which may again be separated by _.E or e, an optional - or +, and more digits, to represent an exponent value.Boolean
+true or false. These
+SHOULD be represented by implementation as boolean logical values, or some
+approximation thereof.Example
+
+my-node true value=false
+Null
+null represents a null Value. It's up to the
+implementation to decide how to represent this, but it generally signals the
+"absence" of a value. It is reasonable for an implementation to ignore null
+values altogether when deserializing.Example
+
+my-node null key=null
+Whitespace
+
+
+
+
+
+
+
+Name
+Code Pt
+
+
+Character Tabulation
+
+U+0009
+
+Space
+
+U+0020
+
+No-Break Space
+
+U+00A0
+
+Ogham Space Mark
+
+U+1680
+
+En Quad
+
+U+2000
+
+Em Quad
+
+U+2001
+
+En Space
+
+U+2002
+
+Em Space
+
+U+2003
+
+Three-Per-Em Space
+
+U+2004
+
+Four-Per-Em Space
+
+U+2005
+
+Six-Per-Em Space
+
+U+2006
+
+Figure Space
+
+U+2007
+
+Punctuation Space
+
+U+2008
+
+Thin Space
+
+U+2009
+
+Hair Space
+
+U+200A
+
+Narrow No-Break Space
+
+U+202F
+
+Medium Mathematical Space
+
+U+205F
+
+
+Ideographic Space
+
+U+3000Multi-line comments
+//, comments can also be started
+with /* and ended with */. These comments can span multiple lines. They
+are allowed in all positions where Whitespace is allowed and
+can be nested.Newline
+
+
+
+
+
+
+
+Acronym
+Name
+Code Pt
+
+
+CRLF
+Carriage Return and Line Feed
+
+U+000D + U+000A
+
+CR
+Carriage Return
+
+U+000D
+
+LF
+Line Feed
+
+U+000A
+
+NEL
+Next Line
+
+U+0085
+
+FF
+Form Feed
+
+U+000C
+
+LS
+Line Separator
+
+U+2028
+
+
+PS
+Paragraph Separator
+
+U+2029VT Vertical tab U+000B was mistakenly excluded, but the v1 spec if frozen, so it's left unchanged.Full Grammar
+
+nodes := linespace* (node nodes?)? linespace*
+
+node := ('/-' node-space*)? type? identifier (node-space+ node-prop-or-arg)* (node-space* node-children ws*)? node-space* node-terminator
+node-prop-or-arg := ('/-' node-space*)? (prop | value)
+node-children := ('/-' node-space*)? '{' nodes '}'
+node-space := ws* escline ws* | ws+
+node-terminator := single-line-comment | newline | ';' | eof
+
+identifier := string | bare-identifier
+bare-identifier := ((identifier-char - digit - sign) identifier-char* | sign ((identifier-char - digit) identifier-char*)?) - keyword
+identifier-char := unicode - linespace - [\/(){}<>;[]=,"]
+keyword := boolean | 'null'
+prop := identifier '=' value
+value := type? (string | number | keyword)
+type := '(' identifier ')'
+
+string := raw-string | escaped-string
+escaped-string := '"' character* '"'
+character := '\' escape | [^\"]
+escape := ["\\/bfnrt] | 'u{' hex-digit{1, 6} '}'
+hex-digit := [0-9a-fA-F]
+
+raw-string := 'r' raw-string-hash
+raw-string-hash := '#' raw-string-hash '#' | raw-string-quotes
+raw-string-quotes := '"' .* '"'
+
+number := hex | octal | binary | decimal
+
+decimal := sign? integer ('.' integer)? exponent?
+exponent := ('e' | 'E') sign? integer
+integer := digit (digit | '_')*
+digit := [0-9]
+sign := '+' | '-'
+
+hex := sign? '0x' hex-digit (hex-digit | '_')*
+octal := sign? '0o' [0-7] [0-7_]*
+binary := sign? '0b' ('0' | '1') ('0' | '1' | '_')*
+
+boolean := 'true' | 'false'
+
+escline := '\\' ws* (single-line-comment | newline)
+
+linespace := newline | ws | single-line-comment
+
+newline := See Table (All line-break white_space)
+
+ws := bom | unicode-space | multi-line-comment
+
+bom := '\u{FEFF}'
+
+unicode-space := See Table (All White_Space unicode characters which are not `newline`)
+
+single-line-comment := '//' ^newline+ (newline | eof)
+multi-line-comment := '/*' commented-block
+commented-block := '*/' | (multi-line-comment | '*' | '/' | [^*/]+) commented-block
+