get rid of syntactically significant unicode equals signs (#400)

Fixes: #399
This commit is contained in:
Kat Marchán 2024-11-28 22:39:19 -08:00 committed by GitHub
parent fa3050ccc0
commit 1588b1f5fd
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
7 changed files with 15 additions and 37 deletions

View File

@ -59,10 +59,6 @@
whitespace matching the whitespace prefix of the closing line. Multiline
strings and raw strings now must have a newline immediately following their
opening `"`, and a final newline plus whitespace preceding the closing `"`.
* SMALL EQUALS SIGN (`U+FE66`), FULLWIDTH EQUALS SIGN (`U+FF1D`), and HEAVY
EQUALS SIGN (`U+1F7F0`) are now treated the same as `=` and can be used for
properties (e.g. `お名前=☜(゚ヮ゚☜)`). They are also no longer valid in bare
identifiers.
* `.1`, `+.1` etc are no longer valid identifiers, to prevent confusion and
conflicts with numbers.
* Multi-line strings' literal Newline sequences are now normalized to single

View File

@ -158,11 +158,10 @@ node3 #"C:\Users\zkat\raw\string"#
You don't have to quote strings unless any the following apply:
* The string contains whitespace.
* The string contains any of `[]{}()\/#";`.
* The string is one of `true`, `false`, or `null`.
* The string contains any of `[]{}()\/#";=`.
* The string is one of `true`, `false`, `null`, `inf`, `-inf`, or `nan`.
* The strings starts with a digit, or `+`/`-`/`.`/`-.`,`+.` and a digit.
* The string contains an equals sign (including unicode equals signs `﹦`,
``, and `🟰`).
(aka "looks like a number")
In essence, if it can get confused for other KDL or KQL syntax, it needs
quotes.
@ -296,8 +295,8 @@ smile 😁
// Identifiers are very flexible. The following is a legal bare identifier:
<@foo123~!$%^&*.:'|?+>
// And you can also use unicode, even for the equals sign!
ノード お名前=☜(゚ヮ゚☜)
// And you can also use unicode!
ノード お名前=ฅ^•ﻌ•^ฅ
// kdl specifically allows properties and values to be
// interspersed with each other, much like CLI commands.
@ -335,9 +334,9 @@ SDLang, but that had some design choices I disagreed with.
#### Ok, then, why not SDLang?
SDLang is designed for use cases that are not interesting to me, but are very
relevant to the D-lang community. KDL is very similar in many ways, but is
different in the following ways:
SDLang is an excellent base, but I wanted some details ironed out, and some
things removed that only really made sense for SDLang's current use-cases, including
some restrictions about data representation. KDL is very similar in many ways, except:
* The grammar and expected semantics are [well-defined and specified](SPEC.md).
* There is only one "number" type. KDL does not prescribe representations.

24
SPEC.md
View File

@ -112,8 +112,8 @@ my-node 1 2 \ // comments are ok after \
### Property
A Property is a key/value pair attached to a [Node](#node). A Property is
composed of a [String](#string), followed immediately by an [equals
sign](#equals-sign), and then a [Value](#value).
composed of a [String](#string), followed immediately by an equals sign (`=`, `U+003D`),
and then a [Value](#value).
Properties should be interpreted left-to-right, with rightmost properties with
identical names overriding earlier properties. That is:
@ -131,17 +131,6 @@ still be spec-compliant.
Properties _MAY_ be prefixed with `/-` to "comment out" the entire token and
make it act as plain whitespace, even if it spreads across multiple lines.
#### Equals Sign
Any of the following characters may be used as equals signs in properties:
| Name | Character | Code Point |
|----|-----|----|
| EQUALS SIGN | `=` | `U+003D` |
| SMALL EQUALS SIGN | `﹦` | `U+FE66` |
| FULLWIDTH EQUALS SIGN | `` | `U+FF1D` |
| HEAVY EQUALS SIGN | `🟰` | `U+1F7F0` |
### Argument
An Argument is a bare [Value](#value) attached to a [Node](#node), with no
@ -334,8 +323,7 @@ negative number.
The following characters cannot be used anywhere in a [Identifier String](#identifier-string):
* Any of `(){}[]/\"#;`
* Any [Equals Sign](#equals-sign)
* Any of `(){}[]/\"#;=`
* Any [Whitespace](#whitespace) or [Newline](#newline).
* Any [disallowed literal code points](#disallowed-literal-code-points) in KDL
documents.
@ -780,19 +768,17 @@ node-prop-or-arg := prop | value
node-children := '{' nodes final-node? '}'
node-terminator := single-line-comment | newline | ';' | eof
prop := string optional-node-space equals-sign optional-node-space value
prop := string optional-node-space '=' optional-node-space value
value := type? optional-node-space (string | number | keyword)
type := '(' optional-node-space string optional-node-space ')'
equals-sign := See Table ([Equals Sign](#equals-sign))
string := identifier-string | quoted-string | raw-string
identifier-string := unambiguous-ident | signed-ident | dotted-ident
unambiguous-ident := ((identifier-char - digit - sign - '.') identifier-char*) - 'true' - 'false' - 'null' - 'inf' - '-inf' - 'nan'
signed-ident := sign ((identifier-char - digit - '.') identifier-char*)?
dotted-ident := sign? '.' ((identifier-char - digit) identifier-char*)?
identifier-char := unicode - unicode-space - newline - [\\/(){};\[\]"#] - disallowed-literal-code-points - equals-sign
identifier-char := unicode - unicode-space - newline - [\\/(){};\[\]"#=] - disallowed-literal-code-points
quoted-string := '"' (single-line-string-body | newline multi-line-string-body newline unicode-space*) '"'
single-line-string-body := (string-character - newline)*

View File

@ -1 +0,0 @@
node p1=val1 p2=val2 p3=val3

View File

@ -0,0 +1 @@
ノード お名前=ฅ^•ﻌ•^ฅ

View File

@ -1,4 +0,0 @@
node \
p1﹦val1 \ // U+FE66
p2val2 \ // U+FF1D
p3🟰val3 // U+1F7F0

View File

@ -0,0 +1 @@
ノード お名前=ฅ^•ﻌ•^ฅ