treat bare identifiers and strings in value locations (#358)

Fixes: https://github.com/kdl-org/kdl/issues/339
This commit is contained in:
Kat Marchán 2023-12-12 21:03:30 -08:00 committed by GitHub
parent e6356d5a03
commit 85aa3a09ab
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 55 additions and 32 deletions

View File

@ -29,6 +29,18 @@
places in identifiers. places in identifiers.
* Line continuations can be followed by an EOF now, instead of requiring a * Line continuations can be followed by an EOF now, instead of requiring a
newline (or comment). `node \<EOF>` is now a legal KDL document. newline (or comment). `node \<EOF>` is now a legal KDL document.
* `#` is no longer a legal identifier character.
* `null`, `true`, and `false` are now `#null`, `#true`, and `#false`. Using
the unprefixed versions of these values is a syntax error.
* The spec prose has more explicitly stated that whitespace and newlines are
not valid identifier characters, even though the grammar already expressed
this.
* Bare identifiers can now be used as values in Arguments and Properties, and are interpreted as string values.
* The spec prose now more explicitly states that strings and raw strings can
be used as type annotations.
* A statement in the spec prose that said "It is reasonable for an
implementation to ignore null values altogether when deserializing". This is
no longer encouraged or desired.
### KQL ### KQL

75
SPEC.md
View File

@ -93,17 +93,27 @@ foo 1 key="val" 3 {
### Identifier ### Identifier
A bare Identifier is composed of any Unicode codepoint other than [non-initial A bare Identifier is composed of any Unicode codepoint other than [non-initial
characters](#non-initial-characters), followed by any number of Unicode characters](#non-initial-characters), followed by any number of Unicode code
code points other than [non-identifier characters](#non-identifier-characters), points other than [non-identifier characters](#non-identifier-characters), so
so long as this doesn't produce something confusable for a [Number](#number), long as this doesn't produce something confusable for a [Number](#number). For
[Boolean](#boolean), or [Null](#null). For example, both a [Number](#number) example, both a [Number](#number) and an Identifier can start with `-`, but
and an Identifier can start with `-`, but when an Identifier starts with `-` when an Identifier starts with `-` the second character cannot be a digit.
the second character cannot be a digit. This is precicely specified in the This is precicely specified in the [Full Grammar](#full-grammar) below.
[Full Grammar](#full-grammar) below.
When Identifiers are used as the values in [Arguments](#argument) and
[Properties](#property), they are treated as strings, just like they are with
node names and property keys.
Identifiers are terminated by [Whitespace](#whitespace) or Identifiers are terminated by [Whitespace](#whitespace) or
[Newlines](#newline). [Newlines](#newline).
In all places where Identifiers are used, [Strings](#string) and [Raw
Strings](#raw-string) can be used in the same place, without an Identifier's
character restrictions.
The literal identifiers `true`, `false`, and `null` are illegal identifiers,
and _MUST_ be treated as a syntax error.
### Non-initial characters ### Non-initial characters
The following characters cannot be the first character in a bare The following characters cannot be the first character in a bare
@ -112,17 +122,18 @@ The following characters cannot be the first character in a bare
* Any decimal digit (0-9) * Any decimal digit (0-9)
* Any [non-identifier characters](#non-identifier-characters) * Any [non-identifier characters](#non-identifier-characters)
Be aware that the `-` character can only be used as an initial Additionally, the `-` character can only be used as an initial character if
character if the second character is not a digit. This allows the second character is *not* a digit. This allows identifiers to look like
identifiers to look like `--this`, and removes the ambiguity `--this`, and removes the ambiguity of having an identifier look like a
of having an identifier look like a negative number. negative number.
### Non-identifier characters ### Non-identifier characters
The following characters cannot be used anywhere in a bare The following characters cannot be used anywhere in a bare
[Identifier](#identifier): [Identifier](#identifier):
* Any of `\/(){};[]="` * Any of `(){}[]/\="#;`
* Any [Whitespace](#whitespace) or [Newline](#newline).
* Any [disallowed literal code points](#disallowed-literal-code-points) in KDL * Any [disallowed literal code points](#disallowed-literal-code-points) in KDL
documents. documents.
@ -180,7 +191,7 @@ make it act as plain whitespace, even if it spreads across multiple lines.
#### Example #### Example
```kdl ```kdl
my-node 1 2 3 "a" "b" "c" my-node 1 2 3 a b c
``` ```
### Children Block ### Children Block
@ -205,8 +216,9 @@ parent { child1; child2; }
### Value ### Value
A value is either: a [String](#string), a [Raw String](#raw-string), a A value is either: an [Identifier](#identifier), a [String](#string), a [Raw
[Number](#number), a [Boolean](#boolean), or [Null](#null) String](#raw-string), a [Number](#number), a [Boolean](#boolean), or
[Null](#null)
Values _MUST_ be either [Arguments](#argument) or values of Values _MUST_ be either [Arguments](#argument) or values of
[Properties](#property). [Properties](#property).
@ -221,9 +233,9 @@ or as a _context-specific elaboration_ of the more generic type the node name
indicates. indicates.
Type annotations are written as a set of `(` and `)` with a single Type annotations are written as a set of `(` and `)` with a single
[Identifier](#identifier) in it. Any valid identifier is considered a valid [Identifier](#identifier) in it. Any valid identifier or string is considered
type annotation. There must be no whitespace between a type annotation and its a valid type annotation. There must be no whitespace between a type annotation
associated Node Name or Value. and its associated Node Name or Value.
KDL does not specify any restrictions on what implementations might do with KDL does not specify any restrictions on what implementations might do with
these annotations. They are free to ignore them, or use them to make decisions these annotations. They are free to ignore them, or use them to make decisions
@ -295,7 +307,7 @@ IEEE 754-2008 decimal floating point numbers
```kdl ```kdl
node (u8)123 node (u8)123
node prop=(regex)".*" node prop=(regex).*
(published)date "1970-01-01" (published)date "1970-01-01"
(contributor)person name="Foo McBar" (contributor)person name="Foo McBar"
``` ```
@ -411,27 +423,26 @@ There are four syntaxes for Numbers: Decimal, Hexadecimal, Octal, and Binary.
### Boolean ### Boolean
A boolean [Value](#value) is either the symbol `true` or `false`. These A boolean [Value](#value) is either the symbol `#true` or `#false`. These
_SHOULD_ be represented by implementation as boolean logical values, or some _SHOULD_ be represented by implementation as boolean logical values, or some
approximation thereof. approximation thereof.
#### Example #### Example
```kdl ```kdl
my-node true value=false my-node true value=#false
``` ```
### Null ### Null
The symbol `null` represents a null [Value](#value). It's up to the The symbol `#null` represents a null [Value](#value). It's up to the
implementation to decide how to represent this, but it generally signals the implementation to decide how to represent this, but it generally signals the
"absence" of a value. It is reasonable for an implementation to ignore null "absence" of a value.
values altogether when deserializing.
#### Example #### Example
```kdl ```kdl
my-node null key=null my-node #null key=#null
``` ```
### Whitespace ### Whitespace
@ -519,19 +530,19 @@ node-children := '{' nodes '}'
node-terminator := single-line-comment | newline | ';' | eof node-terminator := single-line-comment | newline | ';' | eof
identifier := string | bare-identifier identifier := string | bare-identifier
bare-identifier := (unambiguous-ident | numberish-ident) - keyword bare-identifier := (unambiguous-ident - boolean - 'null') | numberish-ident
unambiguous-ident := (identifier-char - digit - sign - "#") identifier-char* unambiguous-ident := (identifier-char - digit - sign) identifier-char*
numberish-ident := sign ((identifier-char - digit) identifier-char*)? numberish-ident := sign ((identifier-char - digit) identifier-char*)?
identifier-char := unicode - line-space - [\\/(){};\[\]="] - disallowed-literal-code-points identifier-char := unicode - line-space - [\\/(){};\[\]="#] - disallowed-literal-code-points
keyword := boolean | 'null' keyword := '#' (boolean | 'null')
prop := identifier '=' valuel prop := identifier '=' value
value := type? (string | number | keyword) value := type? (identifier | string | number | keyword)
type := '(' identifier ')' type := '(' identifier ')'
string := raw-string | escaped-string string := raw-string | escaped-string
escaped-string := '"' string-character* '"' escaped-string := '"' string-character* '"'
string-character := '\' escape | [^\"] - disallowed-literal-code-points string-character := '\' escape | [^\\"] - disallowed-literal-code-points
escape := ["\\bfnrt] | 'u{' hex-digit{1, 6} '}' | (unicode-space | newline)+ escape := ["\\bfnrt] | 'u{' hex-digit{1, 6} '}' | (unicode-space | newline)+
hex-digit := [0-9a-fA-F] hex-digit := [0-9a-fA-F]