get rid of syntactically significant unicode equals signs (#400)

Fixes: #399
This commit is contained in:
Kat Marchán 2024-11-28 22:39:19 -08:00 committed by GitHub
parent fa3050ccc0
commit 1588b1f5fd
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
7 changed files with 15 additions and 37 deletions

View File

@ -59,10 +59,6 @@
whitespace matching the whitespace prefix of the closing line. Multiline whitespace matching the whitespace prefix of the closing line. Multiline
strings and raw strings now must have a newline immediately following their strings and raw strings now must have a newline immediately following their
opening `"`, and a final newline plus whitespace preceding the closing `"`. opening `"`, and a final newline plus whitespace preceding the closing `"`.
* SMALL EQUALS SIGN (`U+FE66`), FULLWIDTH EQUALS SIGN (`U+FF1D`), and HEAVY
EQUALS SIGN (`U+1F7F0`) are now treated the same as `=` and can be used for
properties (e.g. `お名前=☜(゚ヮ゚☜)`). They are also no longer valid in bare
identifiers.
* `.1`, `+.1` etc are no longer valid identifiers, to prevent confusion and * `.1`, `+.1` etc are no longer valid identifiers, to prevent confusion and
conflicts with numbers. conflicts with numbers.
* Multi-line strings' literal Newline sequences are now normalized to single * Multi-line strings' literal Newline sequences are now normalized to single

View File

@ -158,11 +158,10 @@ node3 #"C:\Users\zkat\raw\string"#
You don't have to quote strings unless any the following apply: You don't have to quote strings unless any the following apply:
* The string contains whitespace. * The string contains whitespace.
* The string contains any of `[]{}()\/#";`. * The string contains any of `[]{}()\/#";=`.
* The string is one of `true`, `false`, or `null`. * The string is one of `true`, `false`, `null`, `inf`, `-inf`, or `nan`.
* The strings starts with a digit, or `+`/`-`/`.`/`-.`,`+.` and a digit. * The strings starts with a digit, or `+`/`-`/`.`/`-.`,`+.` and a digit.
* The string contains an equals sign (including unicode equals signs `﹦`, (aka "looks like a number")
``, and `🟰`).
In essence, if it can get confused for other KDL or KQL syntax, it needs In essence, if it can get confused for other KDL or KQL syntax, it needs
quotes. quotes.
@ -296,8 +295,8 @@ smile 😁
// Identifiers are very flexible. The following is a legal bare identifier: // Identifiers are very flexible. The following is a legal bare identifier:
<@foo123~!$%^&*.:'|?+> <@foo123~!$%^&*.:'|?+>
// And you can also use unicode, even for the equals sign! // And you can also use unicode!
ノード お名前=☜(゚ヮ゚☜) ノード お名前=ฅ^•ﻌ•^ฅ
// kdl specifically allows properties and values to be // kdl specifically allows properties and values to be
// interspersed with each other, much like CLI commands. // interspersed with each other, much like CLI commands.
@ -335,9 +334,9 @@ SDLang, but that had some design choices I disagreed with.
#### Ok, then, why not SDLang? #### Ok, then, why not SDLang?
SDLang is designed for use cases that are not interesting to me, but are very SDLang is an excellent base, but I wanted some details ironed out, and some
relevant to the D-lang community. KDL is very similar in many ways, but is things removed that only really made sense for SDLang's current use-cases, including
different in the following ways: some restrictions about data representation. KDL is very similar in many ways, except:
* The grammar and expected semantics are [well-defined and specified](SPEC.md). * The grammar and expected semantics are [well-defined and specified](SPEC.md).
* There is only one "number" type. KDL does not prescribe representations. * There is only one "number" type. KDL does not prescribe representations.

24
SPEC.md
View File

@ -112,8 +112,8 @@ my-node 1 2 \ // comments are ok after \
### Property ### Property
A Property is a key/value pair attached to a [Node](#node). A Property is A Property is a key/value pair attached to a [Node](#node). A Property is
composed of a [String](#string), followed immediately by an [equals composed of a [String](#string), followed immediately by an equals sign (`=`, `U+003D`),
sign](#equals-sign), and then a [Value](#value). and then a [Value](#value).
Properties should be interpreted left-to-right, with rightmost properties with Properties should be interpreted left-to-right, with rightmost properties with
identical names overriding earlier properties. That is: identical names overriding earlier properties. That is:
@ -131,17 +131,6 @@ still be spec-compliant.
Properties _MAY_ be prefixed with `/-` to "comment out" the entire token and Properties _MAY_ be prefixed with `/-` to "comment out" the entire token and
make it act as plain whitespace, even if it spreads across multiple lines. make it act as plain whitespace, even if it spreads across multiple lines.
#### Equals Sign
Any of the following characters may be used as equals signs in properties:
| Name | Character | Code Point |
|----|-----|----|
| EQUALS SIGN | `=` | `U+003D` |
| SMALL EQUALS SIGN | `﹦` | `U+FE66` |
| FULLWIDTH EQUALS SIGN | `` | `U+FF1D` |
| HEAVY EQUALS SIGN | `🟰` | `U+1F7F0` |
### Argument ### Argument
An Argument is a bare [Value](#value) attached to a [Node](#node), with no An Argument is a bare [Value](#value) attached to a [Node](#node), with no
@ -334,8 +323,7 @@ negative number.
The following characters cannot be used anywhere in a [Identifier String](#identifier-string): The following characters cannot be used anywhere in a [Identifier String](#identifier-string):
* Any of `(){}[]/\"#;` * Any of `(){}[]/\"#;=`
* Any [Equals Sign](#equals-sign)
* Any [Whitespace](#whitespace) or [Newline](#newline). * Any [Whitespace](#whitespace) or [Newline](#newline).
* Any [disallowed literal code points](#disallowed-literal-code-points) in KDL * Any [disallowed literal code points](#disallowed-literal-code-points) in KDL
documents. documents.
@ -780,19 +768,17 @@ node-prop-or-arg := prop | value
node-children := '{' nodes final-node? '}' node-children := '{' nodes final-node? '}'
node-terminator := single-line-comment | newline | ';' | eof node-terminator := single-line-comment | newline | ';' | eof
prop := string optional-node-space equals-sign optional-node-space value prop := string optional-node-space '=' optional-node-space value
value := type? optional-node-space (string | number | keyword) value := type? optional-node-space (string | number | keyword)
type := '(' optional-node-space string optional-node-space ')' type := '(' optional-node-space string optional-node-space ')'
equals-sign := See Table ([Equals Sign](#equals-sign))
string := identifier-string | quoted-string | raw-string string := identifier-string | quoted-string | raw-string
identifier-string := unambiguous-ident | signed-ident | dotted-ident identifier-string := unambiguous-ident | signed-ident | dotted-ident
unambiguous-ident := ((identifier-char - digit - sign - '.') identifier-char*) - 'true' - 'false' - 'null' - 'inf' - '-inf' - 'nan' unambiguous-ident := ((identifier-char - digit - sign - '.') identifier-char*) - 'true' - 'false' - 'null' - 'inf' - '-inf' - 'nan'
signed-ident := sign ((identifier-char - digit - '.') identifier-char*)? signed-ident := sign ((identifier-char - digit - '.') identifier-char*)?
dotted-ident := sign? '.' ((identifier-char - digit) identifier-char*)? dotted-ident := sign? '.' ((identifier-char - digit) identifier-char*)?
identifier-char := unicode - unicode-space - newline - [\\/(){};\[\]"#] - disallowed-literal-code-points - equals-sign identifier-char := unicode - unicode-space - newline - [\\/(){};\[\]"#=] - disallowed-literal-code-points
quoted-string := '"' (single-line-string-body | newline multi-line-string-body newline unicode-space*) '"' quoted-string := '"' (single-line-string-body | newline multi-line-string-body newline unicode-space*) '"'
single-line-string-body := (string-character - newline)* single-line-string-body := (string-character - newline)*

View File

@ -1 +0,0 @@
node p1=val1 p2=val2 p3=val3

View File

@ -0,0 +1 @@
ノード お名前=ฅ^•ﻌ•^ฅ

View File

@ -1,4 +0,0 @@
node \
p1﹦val1 \ // U+FE66
p2val2 \ // U+FF1D
p3🟰val3 // U+1F7F0

View File

@ -0,0 +1 @@
ノード お名前=ฅ^•ﻌ•^ฅ