35 KiB
| title | abbrev | docname | submissionType | category | ipr | area | venue | workgroup | keyword | stand_alone | smart_quotes | pi | author | normative | informative | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| The KDL Document Language | KDL | draft-marchan-kdl2-latest | independent | exp | trust200902 | General |
|
KDL Community |
|
yes | no |
|
|
If the last line wasn't indented as far, it won't dedent the rest of the lines as much:
multi-line """
foo
This is no longer on the left edge
bar
"""
This example's string value will be:
foo
This is no longer on the left edge
bar
Equivalent to " foo\n This is no longer on the left edge\n bar".
Empty lines can contain any whitespace, or none at all, and will be reflected as empty in the value:
multi-line """
Indented a bit
A second indented paragraph.
"""
This example's string value will be:
Indented a bit.
A second indented paragraph.
Equivalent to "Indented a bit.\n\nA second indented paragraph."
The following yield syntax errors:
multi-line """can't be single line"""
multi-line """
closing quote with non-whitespace prefix"""
multi-line """stuff
"""
// Every line must share the exact same prefix as the closing line.
multi-line """[\n]
[tab]a[\n]
[space][space]b[\n]
[space][tab][\n]
[tab]"""
Interaction with Whitespace Escapes
Multi-line strings support the same mechanism for escaping whitespace as Quoted Strings.
When processing a Multi-line String, implementations MUST dedent the string
after resolving all whitespace escapes, but before resolving other backslash
escapes. This means a whitespace escape that attempts to escape the final line's
newline and/or whitespace prefix can be invalid: if removing escaped whitespace
places the closing """ on a line with non-whitespace characters, this escape
is invalid.
For example, the following example is illegal:
"""
foo
bar\
"""
// equivalent to
"""
foo
bar"""
while the following example is allowed
"""
foo \
bar
baz
\ """
// equivalent to
"""
foo bar
baz
"""
Raw String
Both Quoted and Multi-Line Strings have
Raw String variants, which are identical in syntax except they do not support
\-escapes. This includes line-continuation escapes (\ + ws collapsing to
nothing). They otherwise share the same properties as far as literal
Newline ({{newline}}) characters go, multi-line rules, and the requirement of
UTF-8 representation.
The Raw String variants are indicated by preceding the strings's opening quotes
with one or more # characters. The string is then closed by its normal closing
quotes, followed by a matching number of # characters. This means that the
string may contain any combination of " and # characters other than its
closing delimiter (e.g., if a raw string starts with ##", it can contain "
or "#, but not "## or "###).
Like other Strings, Raw Strings MUST NOT include any of the disallowed literal code-points as code points in their body. Unlike with Quoted Strings, these cannot simply be escaped, and are thus unrepresentable when using Raw Strings.
Example
just-escapes #"\n will be literal"#
The string contains the literal characters \n will be literal.
quotes-and-escapes ##"hello\n\r\asd"#world"##
The string contains the literal characters hello\n\r\asd"#world
raw-multi-line #"""
Here's a """
multiline string
"""
without escapes.
"""#
The string contains the value
Here's a """
multiline string
"""
without escapes.
or equivalently, "Here's a \"\"\"\n multiline string\n \"\"\"\nwithout escapes." as a Quoted String.
Number
Numbers in KDL represent numerical Values. There is no logical distinction in KDL between real numbers, integers, and floating point numbers. It's up to individual implementations to determine how to represent KDL numbers.
There are five syntaxes for Numbers: Keywords, Decimal, Hexadecimal, Octal, and Binary.
- All non-Keyword numbers may optionally start with one of
-or+, which determine whether they'll be positive or negative. - Binary numbers start with
0band only allow0and1as digits, which may be separated by_. They represent numbers in radix 2. - Octal numbers start with
0oand only allow digits between0and7, which may be separated by_. They represent numbers in radix 8. - Hexadecimal numbers start with
0xand allow digits between0and9, as well as lettersAthroughF, in either lower or upper case, which may be separated by_. They represent numbers in radix 16. - Decimal numbers are a bit more special:
- They have no radix prefix.
- They use digits
0through9, which may be separated by_. - They may optionally include a decimal separator
., followed by more digits, which may again be separated by_. - They may optionally be followed by
Eore, an optional-or+, and more digits, to represent an exponent value.
Note that, similar to JSON and some other languages,
numbers without an integer digit (such as .1) are illegal.
They must be written with at least one integer digit, like 0.1.
(These patterns are also disallowed from Identifier Strings, to avoid confusion.)
Keyword Numbers
There are three special "keyword" numbers included in KDL to accomodate the widespread use of IEEE 754 floats:
#inf- floating point positive infinity.#-inf- floating point negative infinity.#nan- floating point NaN/Not a Number.
To go along with this and prevent foot guns, the bare Identifier
Strings inf, -inf, and nan are considered illegal
identifiers and should yield a syntax error.
The existence of these keywords does not imply that any numbers be represented as IEEE 754 floats. These are simply for clarity and convenience for any implementation that chooses to represent their numbers in this way.
Boolean
A boolean Value ({{value}}) is either the symbol #true or #false. These
SHOULD be represented by implementation as boolean logical values, or some
approximation thereof.
Example
my-node #true value=#false
Null
The symbol #null represents a null Value ({{value}}). It's up to the
implementation to decide how to represent this, but it generally signals the
"absence" of a value.
Example
my-node #null key=#null
Whitespace
The following characters should be treated as non-Newline white space:
| Name | Code Pt |
|---|---|
| Character Tabulation | U+0009 |
| Space | U+0020 |
| No-Break Space | U+00A0 |
| Ogham Space Mark | U+1680 |
| En Quad | U+2000 |
| Em Quad | U+2001 |
| En Space | U+2002 |
| Em Space | U+2003 |
| Three-Per-Em Space | U+2004 |
| Four-Per-Em Space | U+2005 |
| Six-Per-Em Space | U+2006 |
| Figure Space | U+2007 |
| Punctuation Space | U+2008 |
| Thin Space | U+2009 |
| Hair Space | U+200A |
| Narrow No-Break Space | U+202F |
| Medium Mathematical Space | U+205F |
| Ideographic Space | U+3000 |
Single-line comments
Any text after //, until the next literal Newline ({{newline}}) is "commented
out", and is considered to be Whitespace ({{whitespace}}).
Multi-line comments
In addition to single-line comments using //, comments can also be started
with /* and ended with */. These comments can span multiple lines. They
are allowed in all positions where Whitespace ({{whitespace}}) is allowed and
can be nested.
Slashdash comments
Finally, a special kind of comment called a "slashdash", denoted by /-, can
be used to comment out entire components of a KDL document logically, and
have those elements not be included as part of the parsed document data.
Slashdash comments can be used before the following, including before their type annotations, if present:
- A Node ({{node}}): the entire Node is treated as Whitespace, including all props, args, and children.
- An Argument ({{argument}}): the Argument value is treated as Whitespace.
- A Property ({{property}}) key: the entire property, including both key and value, is treated as Whitespace. A slashdash of just the property value is not allowed.
- A Children Block: the entire block, including all children within, is treated as Whitespace. Only other children blocks, whether slashdashed or not, may follow a slashdashed children block.
A slashdash may be be followed by any amount of whitespace, including newlines and comments (other than other slashdashes), before the element that it comments out.
Newline
The following character sequences should be treated as new lines:
| Acronym | Name | Code Pt |
|---|---|---|
| CRLF | Carriage Return and Line Feed | U+000D + U+000A |
| CR | Carriage Return | U+000D |
| LF | Line Feed | U+000A |
| NEL | Next Line | U+0085 |
| VT | Vertical tab | U+000B |
| FF | Form Feed | U+000C |
| LS | Line Separator | U+2028 |
| PS | Paragraph Separator | U+2029 |
Note that for the purpose of new lines, the specific sequence CRLF is
considered a single newline.
Disallowed Literal Code Points
The following code points may not appear literally anywhere in the document.
They may be represented in Strings (but not Raw Strings) using Unicode Escapes (\u{...},
except for non Unicode Scalar Value, which can't be represented even as escapes).
- The codepoints
U+0000-0008or the codepointsU+000E-001F(various control characters). U+007F(the Delete control character).- Any codepoint that is not a Unicode Scalar
Value (
U+D800-DFFF). U+200E-200F,U+202A-202E, andU+2066-2069, the unicode "direction control" charactersU+FEFF, aka Zero-width Non-breaking Space (ZWNBSP)/Byte Order Mark (BOM), except as the first code point in a document.
Full Grammar
This is the full official grammar for KDL and should be considered authoritative if something seems to disagree with the text above. The grammar language syntax is defined below.
document := bom? version? nodes
// Nodes
nodes := (line-space* node)* line-space*
base-node := slashdash? type? node-space* string
(node-space+ slashdash? node-prop-or-arg)*
// slashdashed node-children must always be after props and args.
(node-space+ slashdash node-children)*
(node-space+ node-children)?
(node-space+ slashdash node-children)*
node-space*
node := base-node node-terminator
final-node := base-node node-terminator?
// Entries
node-prop-or-arg := prop | value
node-children := '{' nodes final-node? '}'
node-terminator := single-line-comment | newline | ';' | eof
prop := string node-space* '=' node-space* value
value := type? node-space* (string | number | keyword)
type := '(' node-space* string node-space* ')'
// Strings
string := identifier-string | quoted-string | raw-string ¶
identifier-string := unambiguous-ident | signed-ident | dotted-ident
unambiguous-ident :=
((identifier-char - digit - sign - '.') identifier-char*)
- disallowed-keyword-strings
signed-ident :=
sign ((identifier-char - digit - '.') identifier-char*)?
dotted-ident :=
sign? '.' ((identifier-char - digit) identifier-char*)?
identifier-char :=
unicode - unicode-space - newline - [\\/(){};\[\]"#=]
- disallowed-literal-code-points
disallowed-keyword-identifiers :=
'true' | 'false' | 'null' | 'inf' | '-inf' | 'nan'
quoted-string :=
'"' single-line-string-body '"' |
'"""' newline
(multi-line-string-body newline)?
(unicode-space | ws-escape)* '"""'
single-line-string-body := (string-character - newline)*
multi-line-string-body := (('"' | '""')? string-character)*
string-character :=
'\\' (["\\bfnrts] |
'u{' hex-unicode '}') |
ws-escape |
[^\\"] - disallowed-literal-code-points
ws-escape := '\\' (unicode-space | newline)+
hex-digit := [0-9a-fA-F]
hex-unicode := hex-digit{1, 6} - surrogates
surrogates := [dD][8-9a-fA-F]hex-digit{2}
// U+D800-DFFF: D 8 00
// D F FF
raw-string := '#' raw-string-quotes '#' | '#' raw-string '#'
raw-string-quotes :=
'"' single-line-raw-string-body '"' |
'"""' newline
(multi-line-raw-string-body newline)?
unicode-space* '"""'
single-line-raw-string-body :=
'' |
(single-line-raw-string-char - '"')
single-line-raw-string-char*? |
'"' (single-line-raw-string-char - '"')
single-line-raw-string-char*?
single-line-raw-string-char :=
unicode - newline - disallowed-literal-code-points
multi-line-raw-string-body :=
(unicode - disallowed-literal-code-points)*?
// Numbers
number := keyword-number | hex | octal | binary | decimal
decimal := sign? integer ('.' integer)? exponent?
exponent := ('e' | 'E') sign? integer
integer := digit (digit | '_')*
digit := [0-9]
sign := '+' | '-'
hex := sign? '0x' hex-digit (hex-digit | '_')*
octal := sign? '0o' [0-7] [0-7_]*
binary := sign? '0b' ('0' | '1') ('0' | '1' | '_')*
// Keywords and booleans.
keyword := boolean | '#null'
keyword-number := '#inf' | '#-inf' | '#nan'
boolean := '#true' | '#false'
// Specific code points
bom := '\u{FEFF}'
disallowed-literal-code-points :=
See Table (Disallowed Literal Code Points)
unicode := Any Unicode Scalar Value
unicode-space := See Table
(All White_Space unicode characters which are not `newline`)
// Comments
single-line-comment := '//' ^newline* (newline | eof)
multi-line-comment := '/*' commented-block
commented-block :=
'*/' | (multi-line-comment | '*' | '/' | [^*/]+) commented-block
slashdash := '/-' line-space*
// Whitespace
ws := unicode-space | multi-line-comment
escline := '\\' ws* (single-line-comment | newline | eof)
newline := See Table (All Newline White_Space)
// Whitespace where newlines are allowed.
line-space := node-space | newline | single-line-comment
// Whitespace within nodes,
// where newline-ish things must be esclined.
node-space := ws* escline ws* | ws+
// Version marker
version :=
'/-' unicode-space* 'kdl-version' unicode-space+ ('1' | '2')
unicode-space* newline
Grammar language
The grammar language syntax is a combination of ABNF with some regex spice thrown in. Specifically:
- Single quotes (
') are used to denote literal text.\within a literal string is used for escaping other single-quotes, for initiating unicode characters using hex values (\u{FEFF}), and for escaping\itself (\\). *is used for "zero or more",+is used for "one or more", and?is used for "zero or one". Per standard regex semantics,*and+are greedy; they match as many instances as possible without failing the match.*?(used only in raw strings) indicates a non-greedy match; it matches as few instances as possible without failing the match.¶is a cut point. It always matches and consumes no characters, but once matched, the parser is not allowed to backtrack past that point in the source. If a parser would rewind past the cut point, it must instead fail the overall parse, as if it had run out of options. (This is only used with theraw-stringproduction, to ensure the first instance of the appropriate closing quote sequence is guaranteed to be the end of the raw string, rather than allowing it to potentially consume more of the document unexpectedly.)()can be used to group matches that must be matched together.a | bmeansa or b, whichever matches first. If multiple items are before a|, they are a single group.a b c | dis equivalent to(a b c) | d.[]are used for regex-style character matches, where any character between the brackets will be a single match.\is used to escape\,[, and]. They also support character ranges (0-9), and negation (^)-is used for "except for" or "minus" whatever follows it. For example,a - 'x'means "anya, except something that matches the literal'x'".- The prefix
^means "something that does not match" whatever follows it. For example,^foomeans "must not matchfoo". - A single definition may be split over multiple lines. Newlines are treated as spaces.
//followed by text on its own line is used as comment syntax.