`r` prefix is no longer required for raw strings (#354)

Fixes: https://github.com/kdl-org/kdl/issues/337
This commit is contained in:
Kat Marchán 2023-12-12 20:26:12 -08:00 committed by GitHub
parent ba11ffc988
commit d73890741d
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 23 additions and 14 deletions

View File

@ -23,6 +23,12 @@
can now only be represented in regular strings, and there's no facilities to can now only be represented in regular strings, and there's no facilities to
represent them in raw strings. This should be considered a security represent them in raw strings. This should be considered a security
improvement. improvement.
* Raw strings no longer require an `r` prefix: they are now specified by using
`#""#`.
* `#` is an illegal initial identifier character, but is allowed in other
places in identifiers.
* Line continuations can be followed by an EOF now, instead of requiring a
newline (or comment). `node \<EOF>` is now a legal KDL document.
### KQL ### KQL

31
SPEC.md
View File

@ -367,11 +367,16 @@ support `\`-escapes. They otherwise share the same properties as far as
literal [Newline](#newline) characters go, and the requirement of UTF-8 literal [Newline](#newline) characters go, and the requirement of UTF-8
representation. representation.
Raw String literals are represented as `r`, followed by zero or more `#` Raw String literals are represented with one or more `#` characters, followed
characters, followed by `"`, followed by any number of UTF-8 literals. The string is then by `"`, followed by any number of UTF-8 literals. The string is then closed by
closed by a `"` followed by a _matching_ number of `#` characters. This means a `"` followed by a _matching_ number of `#` characters. This means that the
that the string sequence `"` or `"#` and such must not match the closing `"` string sequence `"` or `"#` and such must not match the closing `"` with the
with the same or more `#` characters as the opening `r`. same or more `#` characters as the opening `#`, in the body of the string.
Like Strings, Raw Strings _MUST NOT_ include any of the [disallowed literal
code-points](#disallowed-literal-code-points) as code points in their body.
Unlike with Strings, these cannot simply be escaped, and are thus
unrepresentable when using Raw Strings.
Like Strings, Raw Strings _MUST NOT_ include any of the [disallowed literal Like Strings, Raw Strings _MUST NOT_ include any of the [disallowed literal
code-points](#disallowed-literal-code-points) as code points in their body. code-points](#disallowed-literal-code-points) as code points in their body.
@ -381,8 +386,8 @@ unrepresentable when using Raw Strings.
#### Example #### Example
```kdl ```kdl
just-escapes r"\n will be literal" just-escapes #"\n will be literal"#
quotes-and-escapes r#"hello\n\r\asd"world"# quotes-and-escapes ##"hello\n\r\asd"#world"##
``` ```
### Number ### Number
@ -514,10 +519,9 @@ node-children := '{' nodes '}'
node-terminator := single-line-comment | newline | ';' | eof node-terminator := single-line-comment | newline | ';' | eof
identifier := string | bare-identifier identifier := string | bare-identifier
bare-identifier := (unambiguous-ident | numberish-ident | stringish-ident) - keyword bare-identifier := (unambiguous-ident | numberish-ident) - keyword
unambiguous-ident := (identifier-char - digit - sign - "r") identifier-char* unambiguous-ident := (identifier-char - digit - sign - "#") identifier-char*
numberish-ident := sign ((identifier-char - digit) identifier-char*)? numberish-ident := sign ((identifier-char - digit) identifier-char*)?
stringish-ident := "r" ((identifier-char - "#") identifier-char*)?
identifier-char := unicode - line-space - [\\/(){};\[\]="] - disallowed-literal-code-points identifier-char := unicode - line-space - [\\/(){};\[\]="] - disallowed-literal-code-points
keyword := boolean | 'null' keyword := boolean | 'null'
prop := identifier '=' valuel prop := identifier '=' valuel
@ -530,9 +534,8 @@ string-character := '\' escape | [^\"] - disallowed-literal-code-points
escape := ["\\bfnrt] | 'u{' hex-digit{1, 6} '}' | (unicode-space | newline)+ escape := ["\\bfnrt] | 'u{' hex-digit{1, 6} '}' | (unicode-space | newline)+
hex-digit := [0-9a-fA-F] hex-digit := [0-9a-fA-F]
raw-string := 'r' raw-string-hash raw-string := '#' raw-string-quotes '#' | '#' raw-string '#'
raw-string-hash := '#' raw-string-hash '#' | raw-string-quotes raw-string-quotes := '"' (unicode - disallowed-literal-code-points) '"'
raw-string-quotes := '"' .* '"'
number := decimal | hex | octal | binary number := decimal | hex | octal | binary
@ -548,7 +551,7 @@ binary := sign? '0b' ('0' | '1') ('0' | '1' | '_')*
boolean := 'true' | 'false' boolean := 'true' | 'false'
escline := '\\' ws* (single-line-comment | newline) escline := '\\' ws* (single-line-comment | newline | eof)
newline := See Table (All line-break white_space) newline := See Table (All line-break white_space)