mirror of https://github.com/kdl-org/kdl.git
actual tweaks, plus some automatic reformatting (#497)
This commit is contained in:
parent
aab44fcd1b
commit
21a9eb3f65
|
|
@ -20,19 +20,16 @@ smart_quotes: no
|
||||||
pi: [toc, sortrefs, symrefs]
|
pi: [toc, sortrefs, symrefs]
|
||||||
|
|
||||||
author:
|
author:
|
||||||
-
|
- name: Katerina Zoé Marchán Salvá
|
||||||
name: Katerina Zoé Marchán Salvá
|
|
||||||
ins: K. Marchán
|
ins: K. Marchán
|
||||||
organization: Microsoft
|
organization: Microsoft
|
||||||
-
|
- name: The KDL Contributors
|
||||||
name: The KDL Contributors
|
|
||||||
ins: KDL Contributors
|
ins: KDL Contributors
|
||||||
|
|
||||||
normative:
|
normative:
|
||||||
|
|
||||||
informative:
|
informative:
|
||||||
|
|
||||||
|
|
||||||
--- abstract
|
--- abstract
|
||||||
|
|
||||||
KDL is a node-oriented document language. Its niche and purpose overlaps with
|
KDL is a node-oriented document language. Its niche and purpose overlaps with
|
||||||
|
|
@ -42,10 +39,14 @@ language, and a data exchange or storage format, if you so choose.
|
||||||
This is the formal specification for KDL, including the intended data model and
|
This is the formal specification for KDL, including the intended data model and
|
||||||
the grammar.
|
the grammar.
|
||||||
|
|
||||||
|
This document describes an unreleased minor change to KDL. For the latest
|
||||||
|
oficial version of the language, see https://kdl.dev/spec.
|
||||||
|
|
||||||
|
<!--
|
||||||
This document describes KDL version KDL 2.0.0. It was released on 2024-12-21. It
|
This document describes KDL version KDL 2.0.0. It was released on 2024-12-21. It
|
||||||
is the latest stable version of the language, and will only be edited for minor
|
is the latest stable version of the language, and will only be edited for minor
|
||||||
copyedits or major errata.
|
copyedits or major errata.
|
||||||
|
-->
|
||||||
|
|
||||||
--- note_License
|
--- note_License
|
||||||
|
|
||||||
|
|
@ -53,7 +54,6 @@ This work is licensed under Creative Commons Attribution-ShareAlike 4.0
|
||||||
International. To view a copy of this license, visit
|
International. To view a copy of this license, visit
|
||||||
https://creativecommons.org/licenses/by-sa/4.0/
|
https://creativecommons.org/licenses/by-sa/4.0/
|
||||||
|
|
||||||
|
|
||||||
--- middle
|
--- middle
|
||||||
|
|
||||||
# Compatibility
|
# Compatibility
|
||||||
|
|
@ -84,7 +84,7 @@ rules, with some semantic exceptions involving the data model.
|
||||||
KDL is designed to be easy to read _and_ easy to implement.
|
KDL is designed to be easy to read _and_ easy to implement.
|
||||||
|
|
||||||
In this document, references to "left" or "right" refer to directions in the
|
In this document, references to "left" or "right" refer to directions in the
|
||||||
*data stream* towards the beginning or end, respectively; in other words,
|
_data stream_ towards the beginning or end, respectively; in other words,
|
||||||
the directions if the data stream were only ASCII text. They do not refer
|
the directions if the data stream were only ASCII text. They do not refer
|
||||||
to the writing direction of text, which can flow in either direction,
|
to the writing direction of text, which can flow in either direction,
|
||||||
depending on the characters used.
|
depending on the characters used.
|
||||||
|
|
@ -94,7 +94,7 @@ depending on the characters used.
|
||||||
## Document
|
## Document
|
||||||
|
|
||||||
The toplevel concept of KDL is a Document. A Document is composed of zero or
|
The toplevel concept of KDL is a Document. A Document is composed of zero or
|
||||||
more Nodes ({{node}}), separated by newlines and whitespace, and eventually
|
more Nodes ({{node}}), separated by newlines, semicolons, and whitespace, and eventually
|
||||||
terminated by an EOF.
|
terminated by an EOF.
|
||||||
|
|
||||||
All KDL documents MUST be encoded in UTF-8 and conform to the specifications in
|
All KDL documents MUST be encoded in UTF-8 and conform to the specifications in
|
||||||
|
|
@ -147,7 +147,8 @@ the entire node, including its properties, arguments, and children, and make
|
||||||
it act as plain whitespace, even if it spreads across multiple lines.
|
it act as plain whitespace, even if it spreads across multiple lines.
|
||||||
|
|
||||||
Finally, a node is terminated by either a Newline ({{newline}}), a semicolon
|
Finally, a node is terminated by either a Newline ({{newline}}), a semicolon
|
||||||
(`;`), the end of a child block (`}`) or the end of the file/stream (an `EOF`).
|
(`;`), the end of its parent's child block (`}`) or the end of the file/stream
|
||||||
|
(an `EOF`).
|
||||||
|
|
||||||
### Example
|
### Example
|
||||||
|
|
||||||
|
|
@ -234,7 +235,7 @@ parent {
|
||||||
child2
|
child2
|
||||||
}
|
}
|
||||||
|
|
||||||
parent { child1; child2; }
|
parent { child1; child2 }
|
||||||
~~~
|
~~~
|
||||||
|
|
||||||
## Value
|
## Value
|
||||||
|
|
@ -271,63 +272,64 @@ and, if used, SHOULD interpret these types as follows:
|
||||||
|
|
||||||
Signed integers of various sizes (the number is the bit size):
|
Signed integers of various sizes (the number is the bit size):
|
||||||
|
|
||||||
* `i8`
|
- `i8`
|
||||||
* `i16`
|
- `i16`
|
||||||
* `i32`
|
- `i32`
|
||||||
* `i64`
|
- `i64`
|
||||||
* `i128`
|
- `i128`
|
||||||
|
|
||||||
Unsigned integers of various sizes (the number is the bit size):
|
Unsigned integers of various sizes (the number is the bit size):
|
||||||
|
|
||||||
* `u8`
|
- `u8`
|
||||||
* `u16`
|
- `u16`
|
||||||
* `u32`
|
- `u32`
|
||||||
* `u64`
|
- `u64`
|
||||||
* `u128`
|
- `u128`
|
||||||
|
|
||||||
Platform-dependent integer types, both signed and unsigned:
|
Platform-dependent integer types, both signed and unsigned:
|
||||||
|
|
||||||
* `isize`
|
- `isize`
|
||||||
* `usize`
|
- `usize`
|
||||||
|
|
||||||
### Reserved Type Annotations for Numbers With Decimals:
|
### Reserved Type Annotations for Numbers With Decimals:
|
||||||
|
|
||||||
IEEE 754 floating point numbers, both single (32) and double (64) precision:
|
IEEE 754 floating point numbers, both single (32) and double (64) precision:
|
||||||
|
|
||||||
* `f32`
|
- `f32`
|
||||||
* `f64`
|
- `f64`
|
||||||
|
|
||||||
IEEE 754-2008 decimal floating point numbers
|
IEEE 754-2008 decimal floating point numbers
|
||||||
|
|
||||||
* `decimal64`
|
- `decimal64`
|
||||||
* `decimal128`
|
- `decimal128`
|
||||||
|
|
||||||
### Reserved Type Annotations for Strings:
|
### Reserved Type Annotations for Strings:
|
||||||
|
|
||||||
* `date-time`: ISO8601 date/time format.
|
- `date-time`: ISO8601 date/time format.
|
||||||
* `time`: "Time" section of ISO8601.
|
- `time`: "Time" section of ISO8601.
|
||||||
* `date`: "Date" section of ISO8601.
|
- `date`: "Date" section of ISO8601.
|
||||||
* `duration`: ISO8601 duration format.
|
- `duration`: ISO8601 duration format.
|
||||||
* `decimal`: IEEE 754-2008 decimal string format.
|
- `decimal`: IEEE 754-2008 decimal string format.
|
||||||
* `currency`: ISO 4217 currency code.
|
- `currency`: ISO 4217 currency code.
|
||||||
* `country-2`: ISO 3166-1 alpha-2 country code.
|
- `country-2`: ISO 3166-1 alpha-2 country code.
|
||||||
* `country-3`: ISO 3166-1 alpha-3 country code.
|
- `country-3`: ISO 3166-1 alpha-3 country code.
|
||||||
* `country-subdivision`: ISO 3166-2 country subdivision code.
|
- `country-subdivision`: ISO 3166-2 country subdivision code.
|
||||||
* `email`: RFC5322 email address.
|
- `email`: RFC5322 email address.
|
||||||
* `idn-email`: RFC6531 internationalized email address.
|
- `idn-email`: RFC6531 internationalized email address.
|
||||||
* `hostname`: RFC1123 internet hostname (only ASCII segments)
|
- `hostname`: RFC1123 internet hostname (only ASCII segments)
|
||||||
* `idn-hostname`: RFC5890 internationalized internet hostname
|
- `idn-hostname`: RFC5890 internationalized internet hostname
|
||||||
(only `xn--`-prefixed ASCII "punycode" segments, or non-ASCII segments)
|
(only `xn--`-prefixed ASCII "punycode" segments, or non-ASCII segments)
|
||||||
* `ipv4`: RFC2673 dotted-quad IPv4 address.
|
- `ipv4`: RFC2673 dotted-quad IPv4 address.
|
||||||
* `ipv6`: RFC2373 IPv6 address.
|
- `ipv6`: RFC2373 IPv6 address.
|
||||||
* `url`: RFC3986 URI.
|
- `url`: RFC3986 URI.
|
||||||
* `url-reference`: RFC3986 URI Reference.
|
- `url-reference`: RFC3986 URI Reference.
|
||||||
* `irl`: RFC3987 Internationalized Resource Identifier.
|
- `irl`: RFC3987 Internationalized Resource Identifier.
|
||||||
* `irl-reference`: RFC3987 Internationalized Resource Identifier Reference.
|
- `irl-reference`: RFC3987 Internationalized Resource Identifier Reference.
|
||||||
* `url-template`: RFC6570 URI Template.
|
- `url-template`: RFC6570 URI Template.
|
||||||
* `uuid`: RFC4122 UUID.
|
- `uuid`: RFC4122 UUID.
|
||||||
* `regex`: Regular expression. Specific patterns may be implementation-dependent.
|
- `regex`: Regular expression. Specific patterns may be implementation-dependent.
|
||||||
* `base64`: A Base64-encoded string, denoting arbitrary binary data.
|
- `base64`: A Base64-encoded string, denoting arbitrary binary data.
|
||||||
|
- `base85`: An [Ascii85](https://en.wikipedia.org/wiki/Ascii85)-encoded string, denoting arbitrary binary data.
|
||||||
|
|
||||||
### Examples
|
### Examples
|
||||||
|
|
||||||
|
|
@ -347,12 +349,12 @@ or a Multi-Line String ({{multi-line-string}}).
|
||||||
Both Quoted and Multiline strings come in normal
|
Both Quoted and Multiline strings come in normal
|
||||||
and Raw String ({{raw-string}}) variants (like `#"foo"#`):
|
and Raw String ({{raw-string}}) variants (like `#"foo"#`):
|
||||||
|
|
||||||
* Identifier Strings let you write short, "single-word" strings with a
|
- Identifier Strings let you write short, "single-word" strings with a
|
||||||
minimum of syntax
|
minimum of syntax
|
||||||
* Quoted Strings let you write strings "like normal", with whitespace and escapes.
|
- Quoted Strings let you write strings "like normal", with whitespace and escapes.
|
||||||
* Multi-Line Strings let you write strings across multiple lines
|
- Multi-Line Strings let you write strings across multiple lines
|
||||||
and with indentation that's not part of the string value.
|
and with indentation that's not part of the string value.
|
||||||
* Raw Strings don't allow any escapes,
|
- Raw Strings don't allow any escapes,
|
||||||
allowing you to not worry about the string's content containing anything that
|
allowing you to not worry about the string's content containing anything that
|
||||||
might look like an escape.
|
might look like an escape.
|
||||||
|
|
||||||
|
|
@ -374,10 +376,10 @@ characters ({{non-identifier-characters}}).
|
||||||
|
|
||||||
A handful of patterns are disallowed, to avoid confusion with other values:
|
A handful of patterns are disallowed, to avoid confusion with other values:
|
||||||
|
|
||||||
* idents that appear to start with a Number ({{number}}) (like `1.0v2` or
|
- idents that appear to start with a Number ({{number}}) (like `1.0v2` or
|
||||||
`-1em`) or the "almost a number" pattern of a decimal point without a
|
`-1em`) or the "almost a number" pattern of a decimal point without a
|
||||||
leading digit (like `.1`).
|
leading digit (like `.1`).
|
||||||
* idents that are the language keywords (`inf`, `-inf`, `nan`, `true`,
|
- idents that are the language keywords (`inf`, `-inf`, `nan`, `true`,
|
||||||
`false`, and `null`) without their leading `#`.
|
`false`, and `null`) without their leading `#`.
|
||||||
|
|
||||||
Identifiers that match these patterns _MUST_ be treated as a syntax error; such
|
Identifiers that match these patterns _MUST_ be treated as a syntax error; such
|
||||||
|
|
@ -389,17 +391,17 @@ identifier syntax is specified in the Full Grammar in {{full-grammar}}.
|
||||||
The following characters cannot be the first character in an
|
The following characters cannot be the first character in an
|
||||||
Identifier String ({{identifier-string}}):
|
Identifier String ({{identifier-string}}):
|
||||||
|
|
||||||
* Any decimal digit (0-9)
|
- Any decimal digit (0-9)
|
||||||
* Any non-identifier characters ({{non-identifier-characters}})
|
- Any non-identifier characters ({{non-identifier-characters}})
|
||||||
|
|
||||||
Additionally, the following initial characters impose limitations on subsequent
|
Additionally, the following initial characters impose limitations on subsequent
|
||||||
characters:
|
characters:
|
||||||
|
|
||||||
* the `+` and `-` characters can only be used as an initial character if
|
- the `+` and `-` characters can only be used as an initial character if
|
||||||
the second character is *not* a digit. If the second character is `.`, then
|
the second character is _not_ a digit. If the second character is `.`, then
|
||||||
the third character must *not* be a digit.
|
the third character must _not_ be a digit.
|
||||||
* the `.` character can only be used as an initial character if
|
- the `.` character can only be used as an initial character if
|
||||||
the second character is *not* a digit.
|
the second character is _not_ a digit.
|
||||||
|
|
||||||
This allows identifiers to look like `--this` or `.md`, and removes the
|
This allows identifiers to look like `--this` or `.md`, and removes the
|
||||||
ambiguity of having an identifier look like a number.
|
ambiguity of having an identifier look like a number.
|
||||||
|
|
@ -408,9 +410,9 @@ ambiguity of having an identifier look like a number.
|
||||||
|
|
||||||
The following characters cannot be used anywhere in a Identifier String ({{identifier-string}}):
|
The following characters cannot be used anywhere in a Identifier String ({{identifier-string}}):
|
||||||
|
|
||||||
* Any of `(){}[]/\"#;=`
|
- Any of `(){}[]/\"#;=`
|
||||||
* Any Whitespace ({{whitespace}}) or Newline ({{newline}}).
|
- Any Whitespace ({{whitespace}}) or Newline ({{newline}}).
|
||||||
* Any disallowed literal code points ({{disallowed-literal-code-points}}) in KDL
|
- Any disallowed literal code points ({{disallowed-literal-code-points}}) in KDL
|
||||||
documents.
|
documents.
|
||||||
|
|
||||||
## Quoted String
|
## Quoted String
|
||||||
|
|
@ -450,7 +452,7 @@ interpreted as described in the following table:
|
||||||
| Form Feed | `\f` | `U+000C` |
|
| Form Feed | `\f` | `U+000C` |
|
||||||
| Space | `\s` | `U+0020` |
|
| Space | `\s` | `U+0020` |
|
||||||
| Unicode Escape | `\u{(1-6 hex chars)}` | Code point described by hex characters, as long as it represents a [Unicode Scalar Value](https://unicode.org/glossary/#unicode_scalar_value) |
|
| Unicode Escape | `\u{(1-6 hex chars)}` | Code point described by hex characters, as long as it represents a [Unicode Scalar Value](https://unicode.org/glossary/#unicode_scalar_value) |
|
||||||
| Whitespace Escape | See below | N/A |
|
| Whitespace Escape | See below | N/A |
|
||||||
|
|
||||||
#### Escaped Whitespace
|
#### Escaped Whitespace
|
||||||
|
|
||||||
|
|
@ -490,7 +492,7 @@ such) are retained. For example, these strings are all semantically identical:
|
||||||
|
|
||||||
#### Invalid escapes
|
#### Invalid escapes
|
||||||
|
|
||||||
Except as described in the escapes table, above, `\` *MUST NOT* precede any
|
Except as described in the escapes table, above, `\` _MUST NOT_ precede any
|
||||||
other characters in a string.
|
other characters in a string.
|
||||||
|
|
||||||
## Multi-line String
|
## Multi-line String
|
||||||
|
|
@ -500,7 +502,7 @@ Newlines. They must use a special multi-line syntax, and they automatically
|
||||||
"dedent" the string, allowing its value to be indented to a visually matching
|
"dedent" the string, allowing its value to be indented to a visually matching
|
||||||
level as desired.
|
level as desired.
|
||||||
|
|
||||||
A Multi-Line String is opened and closed by *three* double-quote characters,
|
A Multi-Line String is opened and closed by _three_ double-quote characters,
|
||||||
like `"""`.
|
like `"""`.
|
||||||
Its first line _MUST_ immediately start with a Newline ({{newline}})
|
Its first line _MUST_ immediately start with a Newline ({{newline}})
|
||||||
after its opening `"""`.
|
after its opening `"""`.
|
||||||
|
|
@ -770,15 +772,15 @@ individual implementations to determine how to represent KDL numbers.
|
||||||
|
|
||||||
There are five syntaxes for Numbers: Keywords, Decimal, Hexadecimal, Octal, and Binary.
|
There are five syntaxes for Numbers: Keywords, Decimal, Hexadecimal, Octal, and Binary.
|
||||||
|
|
||||||
* All non-Keyword ({{keyword-numbers}}) numbers may optionally start with one of `-` or `+`, which determine whether they'll be positive or negative.
|
- All non-Keyword ({{keyword-numbers}}) numbers may optionally start with one of `-` or `+`, which determine whether they'll be positive or negative.
|
||||||
* Binary numbers start with `0b` and only allow `0` and `1` as digits, which may be separated by `_`. They represent numbers in radix 2.
|
- Binary numbers start with `0b` and only allow `0` and `1` as digits, which may be separated by `_`. They represent numbers in radix 2.
|
||||||
* Octal numbers start with `0o` and only allow digits between `0` and `7`, which may be separated by `_`. They represent numbers in radix 8.
|
- Octal numbers start with `0o` and only allow digits between `0` and `7`, which may be separated by `_`. They represent numbers in radix 8.
|
||||||
* Hexadecimal numbers start with `0x` and allow digits between `0` and `9`, as well as letters `A` through `F`, in either lower or upper case, which may be separated by `_`. They represent numbers in radix 16.
|
- Hexadecimal numbers start with `0x` and allow digits between `0` and `9`, as well as letters `A` through `F`, in either lower or upper case, which may be separated by `_`. They represent numbers in radix 16.
|
||||||
* Decimal numbers are a bit more special:
|
- Decimal numbers are a bit more special:
|
||||||
* They have no radix prefix.
|
- They have no radix prefix.
|
||||||
* They use digits `0` through `9`, which may be separated by `_`.
|
- They use digits `0` through `9`, which may be separated by `_`.
|
||||||
* They may optionally include a decimal separator `.`, followed by more digits, which may again be separated by `_`.
|
- They may optionally include a decimal separator `.`, followed by more digits, which may again be separated by `_`.
|
||||||
* They may optionally be followed by `E` or `e`, an optional `-` or `+`, and more digits, to represent an exponent value.
|
- They may optionally be followed by `E` or `e`, an optional `-` or `+`, and more digits, to represent an exponent value.
|
||||||
|
|
||||||
Note that, similar to JSON and some other languages,
|
Note that, similar to JSON and some other languages,
|
||||||
numbers without an integer digit (such as `.1`) are illegal.
|
numbers without an integer digit (such as `.1`) are illegal.
|
||||||
|
|
@ -790,9 +792,9 @@ They must be written with at least one integer digit, like `0.1`.
|
||||||
There are three special "keyword" numbers included in KDL to accomodate the
|
There are three special "keyword" numbers included in KDL to accomodate the
|
||||||
widespread use of [IEEE 754](https://en.wikipedia.org/wiki/IEEE_754) floats:
|
widespread use of [IEEE 754](https://en.wikipedia.org/wiki/IEEE_754) floats:
|
||||||
|
|
||||||
* `#inf` - floating point positive infinity.
|
- `#inf` - floating point positive infinity.
|
||||||
* `#-inf` - floating point negative infinity.
|
- `#-inf` - floating point negative infinity.
|
||||||
* `#nan` - floating point NaN/Not a Number.
|
- `#nan` - floating point NaN/Not a Number.
|
||||||
|
|
||||||
To go along with this and prevent foot guns, the bare Identifier
|
To go along with this and prevent foot guns, the bare Identifier
|
||||||
Strings ({{identifier-string}}) `inf`, `-inf`, and `nan` are considered illegal
|
Strings ({{identifier-string}}) `inf`, `-inf`, and `nan` are considered illegal
|
||||||
|
|
@ -831,26 +833,26 @@ my-node #null key=#null
|
||||||
The following characters should be treated as non-Newline ({{newline}}) [white
|
The following characters should be treated as non-Newline ({{newline}}) [white
|
||||||
space](https://www.unicode.org/Public/UCD/latest/ucd/PropList.txt):
|
space](https://www.unicode.org/Public/UCD/latest/ucd/PropList.txt):
|
||||||
|
|
||||||
| Name | Code Pt |
|
| Name | Code Pt |
|
||||||
|----------------------|---------|
|
| ------------------------- | -------- |
|
||||||
| Character Tabulation | `U+0009` |
|
| Character Tabulation | `U+0009` |
|
||||||
| Space | `U+0020` |
|
| Space | `U+0020` |
|
||||||
| No-Break Space | `U+00A0` |
|
| No-Break Space | `U+00A0` |
|
||||||
| Ogham Space Mark | `U+1680` |
|
| Ogham Space Mark | `U+1680` |
|
||||||
| En Quad | `U+2000` |
|
| En Quad | `U+2000` |
|
||||||
| Em Quad | `U+2001` |
|
| Em Quad | `U+2001` |
|
||||||
| En Space | `U+2002` |
|
| En Space | `U+2002` |
|
||||||
| Em Space | `U+2003` |
|
| Em Space | `U+2003` |
|
||||||
| Three-Per-Em Space | `U+2004` |
|
| Three-Per-Em Space | `U+2004` |
|
||||||
| Four-Per-Em Space | `U+2005` |
|
| Four-Per-Em Space | `U+2005` |
|
||||||
| Six-Per-Em Space | `U+2006` |
|
| Six-Per-Em Space | `U+2006` |
|
||||||
| Figure Space | `U+2007` |
|
| Figure Space | `U+2007` |
|
||||||
| Punctuation Space | `U+2008` |
|
| Punctuation Space | `U+2008` |
|
||||||
| Thin Space | `U+2009` |
|
| Thin Space | `U+2009` |
|
||||||
| Hair Space | `U+200A` |
|
| Hair Space | `U+200A` |
|
||||||
| Narrow No-Break Space| `U+202F` |
|
| Narrow No-Break Space | `U+202F` |
|
||||||
| Medium Mathematical Space | `U+205F` |
|
| Medium Mathematical Space | `U+205F` |
|
||||||
| Ideographic Space | `U+3000` |
|
| Ideographic Space | `U+3000` |
|
||||||
|
|
||||||
### Single-line comments
|
### Single-line comments
|
||||||
|
|
||||||
|
|
@ -873,12 +875,12 @@ have those elements not be included as part of the parsed document data.
|
||||||
Slashdash comments can be used before the following, including before their type
|
Slashdash comments can be used before the following, including before their type
|
||||||
annotations, if present:
|
annotations, if present:
|
||||||
|
|
||||||
* A Node ({{node}}): the entire Node is treated as Whitespace, including all
|
- A Node ({{node}}): the entire Node is treated as Whitespace, including all
|
||||||
props, args, and children.
|
props, args, and children.
|
||||||
* An Argument ({{argument}}): the Argument value is treated as Whitespace.
|
- An Argument ({{argument}}): the Argument value is treated as Whitespace.
|
||||||
* A Property ({{property}}) key: the entire property, including both key and value,
|
- A Property ({{property}}) key: the entire property, including both key and value,
|
||||||
is treated as Whitespace. A slashdash of just the property value is not allowed.
|
is treated as Whitespace. A slashdash of just the property value is not allowed.
|
||||||
* A Children Block ({{children-block}}): the entire block, including all
|
- A Children Block ({{children-block}}): the entire block, including all
|
||||||
children within, is treated as Whitespace. Only other children blocks, whether
|
children within, is treated as Whitespace. Only other children blocks, whether
|
||||||
slashdashed or not, may follow a slashdashed children block.
|
slashdashed or not, may follow a slashdashed children block.
|
||||||
|
|
||||||
|
|
@ -890,16 +892,16 @@ comments (other than other slashdashes), before the element that it comments out
|
||||||
The following character sequences [should be treated as new
|
The following character sequences [should be treated as new
|
||||||
lines](https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-5/#G41643):
|
lines](https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-5/#G41643):
|
||||||
|
|
||||||
| Acronym | Name | Code Pt |
|
| Acronym | Name | Code Pt |
|
||||||
|---------|-----------------|---------|
|
| ------- | ----------------------------- | ------------------- |
|
||||||
| CRLF | Carriage Return and Line Feed | `U+000D` + `U+000A` |
|
| CRLF | Carriage Return and Line Feed | `U+000D` + `U+000A` |
|
||||||
| CR | Carriage Return | `U+000D` |
|
| CR | Carriage Return | `U+000D` |
|
||||||
| LF | Line Feed | `U+000A` |
|
| LF | Line Feed | `U+000A` |
|
||||||
| NEL | Next Line | `U+0085` |
|
| NEL | Next Line | `U+0085` |
|
||||||
| VT | Vertical tab | `U+000B` |
|
| VT | Vertical tab | `U+000B` |
|
||||||
| FF | Form Feed | `U+000C` |
|
| FF | Form Feed | `U+000C` |
|
||||||
| LS | Line Separator | `U+2028` |
|
| LS | Line Separator | `U+2028` |
|
||||||
| PS | Paragraph Separator | `U+2029` |
|
| PS | Paragraph Separator | `U+2029` |
|
||||||
|
|
||||||
Note that for the purpose of new lines, the specific sequence `CRLF` is
|
Note that for the purpose of new lines, the specific sequence `CRLF` is
|
||||||
considered _a single newline_.
|
considered _a single newline_.
|
||||||
|
|
@ -910,15 +912,15 @@ The following code points may not appear literally anywhere in the document.
|
||||||
They may be represented in Strings (but not Raw Strings) using Unicode Escapes ({{escapes}}) (`\u{...}`,
|
They may be represented in Strings (but not Raw Strings) using Unicode Escapes ({{escapes}}) (`\u{...}`,
|
||||||
except for non Unicode Scalar Value, which can't be represented even as escapes).
|
except for non Unicode Scalar Value, which can't be represented even as escapes).
|
||||||
|
|
||||||
* The codepoints `U+0000-0008` or the codepoints `U+000E-001F` (various
|
- The codepoints `U+0000-0008` or the codepoints `U+000E-001F` (various
|
||||||
control characters).
|
control characters).
|
||||||
* `U+007F` (the Delete control character).
|
- `U+007F` (the Delete control character).
|
||||||
* Any codepoint that is not a [Unicode Scalar
|
- Any codepoint that is not a [Unicode Scalar
|
||||||
Value](https://unicode.org/glossary/#unicode_scalar_value) (`U+D800-DFFF`).
|
Value](https://unicode.org/glossary/#unicode_scalar_value) (`U+D800-DFFF`).
|
||||||
* `U+200E-200F`, `U+202A-202E`, and `U+2066-2069`, the [unicode
|
- `U+200E-200F`, `U+202A-202E`, and `U+2066-2069`, the [unicode
|
||||||
"direction control"
|
"direction control"
|
||||||
characters](https://www.w3.org/International/questions/qa-bidi-unicode-controls)
|
characters](https://www.w3.org/International/questions/qa-bidi-unicode-controls)
|
||||||
* `U+FEFF`, aka Zero-width Non-breaking Space (ZWNBSP)/Byte Order Mark (BOM),
|
- `U+FEFF`, aka Zero-width Non-breaking Space (ZWNBSP)/Byte Order Mark (BOM),
|
||||||
except as the first code point in a document.
|
except as the first code point in a document.
|
||||||
|
|
||||||
# Full Grammar
|
# Full Grammar
|
||||||
|
|
@ -983,12 +985,12 @@ string-character :=
|
||||||
[^\\"] - disallowed-literal-code-points
|
[^\\"] - disallowed-literal-code-points
|
||||||
ws-escape := '\\' (unicode-space | newline)+
|
ws-escape := '\\' (unicode-space | newline)+
|
||||||
hex-digit := [0-9a-fA-F]
|
hex-digit := [0-9a-fA-F]
|
||||||
hex-unicode := hex-digit{1, 6} - surrogate - above-max-scalar // Unicode Scalar Value in hex₁₆, leading 0s allowed within length ≤ 6
|
hex-unicode := hex-digit{1, 6} - surrogate - above-max-scalar
|
||||||
surrogate := [0]{0, 2} [dD] [8-9a-fA-F] hex-digit{2}
|
surrogate := [0]{0, 2} [dD] [8-9a-fA-F] hex-digit{2}
|
||||||
// U+D800-DFFF: D 8 00
|
// U+D800-DFFF: D 8 00
|
||||||
// D F FF
|
// D F FF
|
||||||
above-max-scalar = [2-9a-fA-F] hex-digit{5} | [1] [1-9a-fA-F] hex-digit{4}
|
above-max-scalar = [2-9a-fA-F] hex-digit{5} |
|
||||||
// >U+10FFFF: >1 _____ 1 >0 ____
|
[1] [1-9a-fA-F] hex-digit{4}
|
||||||
|
|
||||||
|
|
||||||
raw-string := '#' raw-string-quotes '#' | '#' raw-string '#'
|
raw-string := '#' raw-string-quotes '#' | '#' raw-string '#'
|
||||||
|
|
@ -1062,16 +1064,16 @@ version :=
|
||||||
The grammar language syntax is a combination of ABNF with some regex spice thrown in.
|
The grammar language syntax is a combination of ABNF with some regex spice thrown in.
|
||||||
Specifically:
|
Specifically:
|
||||||
|
|
||||||
* Single quotes (`'`) are used to denote literal text. `\` within a literal
|
- Single quotes (`'`) are used to denote literal text. `\` within a literal
|
||||||
string is used for escaping other single-quotes, for initiating unicode
|
string is used for escaping other single-quotes, for initiating unicode
|
||||||
characters using hex values (`\u{FEFF}`), and for escaping `\` itself
|
characters using hex values (`\u{FEFF}`), and for escaping `\` itself
|
||||||
(`\\`).
|
(`\\`).
|
||||||
* `*` is used for "zero or more", `+` is used for "one or more", and `?` is
|
- `*` is used for "zero or more", `+` is used for "one or more", and `?` is
|
||||||
used for "zero or one". Per standard regex semantics, `*` and `+` are *greedy*;
|
used for "zero or one". Per standard regex semantics, `*` and `+` are _greedy_;
|
||||||
they match as many instances as possible without failing the match.
|
they match as many instances as possible without failing the match.
|
||||||
* `*?` (used only in raw strings) indicates a *non-greedy* match;
|
- `*?` (used only in raw strings) indicates a _non-greedy_ match;
|
||||||
it matches as *few* instances as possible without failing the match.
|
it matches as _few_ instances as possible without failing the match.
|
||||||
* `¶` is a *cut point*. It always matches and consumes no characters,
|
- `¶` is a _cut point_. It always matches and consumes no characters,
|
||||||
but once matched, the parser is not allowed to backtrack past that point in the source.
|
but once matched, the parser is not allowed to backtrack past that point in the source.
|
||||||
If a parser would rewind past the cut point, it must instead fail the overall parse,
|
If a parser would rewind past the cut point, it must instead fail the overall parse,
|
||||||
as if it had run out of options.
|
as if it had run out of options.
|
||||||
|
|
@ -1079,16 +1081,16 @@ Specifically:
|
||||||
to ensure the first instance of the appropriate closing quote sequence
|
to ensure the first instance of the appropriate closing quote sequence
|
||||||
is guaranteed to be the end of the raw string,
|
is guaranteed to be the end of the raw string,
|
||||||
rather than allowing it to potentially consume more of the document unexpectedly.)
|
rather than allowing it to potentially consume more of the document unexpectedly.)
|
||||||
* `()` can be used to group matches that must be matched together.
|
- `()` can be used to group matches that must be matched together.
|
||||||
* `a | b` means `a or b`, whichever matches first. If multiple items are before
|
- `a | b` means `a or b`, whichever matches first. If multiple items are before
|
||||||
a `|`, they are a single group. `a b c | d` is equivalent to `(a b c) | d`.
|
a `|`, they are a single group. `a b c | d` is equivalent to `(a b c) | d`.
|
||||||
* `[]` are used for regex-style character matches, where any character between
|
- `[]` are used for regex-style character matches, where any character between
|
||||||
the brackets will be a single match. `\` is used to escape `\`, `[`, and
|
the brackets will be a single match. `\` is used to escape `\`, `[`, and
|
||||||
`]`. They also support character ranges (`0-9`), and negation (`^`)
|
`]`. They also support character ranges (`0-9`), and negation (`^`)
|
||||||
* `-` is used for "except for" or "minus" whatever follows it. For example,
|
- `-` is used for "except for" or "minus" whatever follows it. For example,
|
||||||
`a - 'x'` means "any `a`, except something that matches the literal `'x'`".
|
`a - 'x'` means "any `a`, except something that matches the literal `'x'`".
|
||||||
* The prefix `^` means "something that does not match" whatever follows it.
|
- The prefix `^` means "something that does not match" whatever follows it.
|
||||||
For example, `^foo` means "must not match `foo`".
|
For example, `^foo` means "must not match `foo`".
|
||||||
* A single definition may be split over multiple lines. Newlines are treated as
|
- A single definition may be split over multiple lines. Newlines are treated as
|
||||||
spaces.
|
spaces.
|
||||||
* `//` followed by text on its own line is used as comment syntax.
|
- `//` followed by text on its own line is used as comment syntax.
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue