From 32f38622ee25bfc6df126f9e0ef694ca8d82d366 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Tue, 15 Dec 2020 22:26:55 -0800 Subject: [PATCH] clarify guidance around file encoding and BOMs --- SPEC.md | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/SPEC.md b/SPEC.md index 4e7a5d6..7fbda91 100644 --- a/SPEC.md +++ b/SPEC.md @@ -13,8 +13,16 @@ language, and a data exchange or storage format, if you so choose. ### Document -The toplevel concept of KDL is a Document. A Document is composed of one or more -[Nodes](#node), separated by newlines and whitespace, and eventually terminated by an EOF. +The toplevel concept of KDL is a Document. A Document is composed of zero or +more [Nodes](#node), separated by newlines and whitespace, and eventually +terminated by an EOF. + +All KDL documents should: + +* Be UTF-8 encoded +* Ignore UTF-8 byte order marks ("BOM") anywhere in the file, even when it's + not the first set of bytes in a stream. +* Conform to the specifications in this document. #### Example @@ -190,7 +198,7 @@ linespace := newline | ws | single-line-comment newline := `000D` | `000A` | `000D` `000A` | `0085` | `000C` | `2028` | `2029` -ws := bom | unicode-space | multi-line-comment +ws := unicode-space | multi-line-comment unicode-space := See Table (All White_Space unicode characters which are not `newline`)