Escaping of / in JSON
On Wed, 13 Apr 2011 07:30:58 +0200, Oliver Hunt <oliver at apple.com> wrote:
It has recently been brought to my attention that a particular use case
of JSON serialisation is to include JSON serialised content directly
into an HTML file (inside a script tag). In this case in addition to
the threat of strings being terminated by a double quote there's also
the potential for the string "</script>" to terminate the JS source.The request i received was to escape the slash character, which is
allowed as input but per ES5 spec we aren't allowed to emit.I will say that I don't really like this idea as it leads to "why not
escape #?", etc but I thought I should bring this up on the list and see
what others think.
My personal opinion is that if you want to embed any string into any formatted context, you need to be aware of the environment you are plugging things into.
If you put something into HTML, you need to know where in the HTML it is.
If it's an intrinsic event handler, the requirements are different than if
its
a script tag. In a script tag, it's not just "</" that's a problem, but
also, e.g.,
"<![CDATA[" and "<!--" if the HTML is actually XHTML or HTML5.
I don't want to start adding exceptions to JSON just to help one usecase.
I'd rather create a function for people to use that can convert a JSON
string
to valid HTML script element content (but not as part of the language,
it's too
HTML specific). It would fit better into HTML5, so that it can follow any
changes to the specification.
(On the other hand, RegExp.quotePattern and RegExp.quoteReplacement like
the Java
versions would make sense to have in ES).
Many JSON serializer implementations escape the "/" character, including for instance PHP's json_encode(). However, JavaScript's own JSON.stringify() does not. If you look at the grammar on json.org, as I read it, the escaping of "/" is optional, since it is a valid UNICODE character, and it's not ", , or a control character.
I personally find this annoying as I never embed JSON into script tags like that, and even if I do, my data never looks like </tag>. I wish that JSON
serializers, including JSON.stringify, had an option to control if you want "/" to be escaped. It could of course default to whatever each implementations current default behavior is, but I think it should be a configurable behavior rather than baked in, one way or the other.
--Kyle
From: "Lasse Reichstein" <reichsteinatwork at gmail.com>
Sent: Wednesday, April 13, 2011 4:26 AM To: "EcmaScript Steen" <es-discuss at mozilla.org>; "es5-discuss" <es5-discuss at mozilla.org>; "Oliver Hunt" <oliver at apple.com>
Subject: Re: Escaping of / in JSON
2011/4/12 Oliver Hunt <oliver at apple.com>:
It has recently been brought to my attention that a particular use case of JSON serialisation is to include JSON serialised content directly into an HTML file (inside a script tag). In this case in addition to the threat of strings being terminated by a double quote there's also the potential for the string "</script>" to terminate the JS source.
If the output can contain a CDATA section end (]]>) or escaping text
span end (-->) then it can also cause premature termination of JS
source.
E.g. in the HTML <script>//<!--
var myJson = { "foo": "-->" }
document.write('<script>..." + myJson + "...</script>"); ... //--></script>
or in the XHTML
<script><![[CDATA[
var myJson = <JSON goes here>
document.write('<script>' + myJSON + '</script>') // Comment with <script>
var myOtherJsonContainingCdataOpen = <more json here>; ]]></script>
The request i received was to escape the slash character, which is allowed as input but per ES5 spec we aren't allowed to emit.
I will say that I don't really like this idea as it leads to "why not escape #?", etc but I thought I should bring this up on the list and see what others think.
One answer to the "why not escape #?" is because it isn't explicitly called out in the JSON spec.
In www.ietf.org/rfc/rfc4627.txt JSON allows escaping of '/', '\', '"', and a few control characters. Other codepoints have to be raw or numerically escaped.
char = unescaped / escape ( %x22 / ; " quotation mark U+0022 %x5C / ; \ reverse solidus U+005C %x2F / ; / solidus U+002F %x62 / ; b backspace U+0008 %x66 / ; f form feed U+000C %x6E / ; n line feed U+000A %x72 / ; r carriage return U+000D %x74 / ; t tab U+0009 %x75 4HEXDIG ) ; uXXXX U+XXXX
On Apr 13, 2011, at 6:14 AM, Kyle Simpson wrote:
Many JSON serializer implementations escape the "/" character, including for instance PHP's json_encode(). However, JavaScript's own JSON.stringify() does not. If you look at the grammar on json.org, as I read it, the escaping of "/" is optional, since it is a valid UNICODE character, and it's not ", , or a control character.
As much as possible, we want ECMAScript implementations to behave identically. For that reason, we specified specific outputs for JSON.stringify, even where the json refc grammar allowed variation. (For example, in theory every character could be stringified as a \uxxxx sequence). We can quibble about the specific encoding choices that were made by ES5 but unless somebody identifies something that is actually a bug according to the the json rfc grammar I doubt if we would make a change at this point.
That said, there is nothing stopping someone from defining and promoting for standardization an additional JSON encoding function that allowed more explicitly control of the encoding choices. Whether or not it got adopted would probably depend upon its demonstrated utility.
On 11:59 AM, Allen Wirfs-Brock wrote:
On Apr 13, 2011, at 6:14 AM, Kyle Simpson wrote:
Many JSON serializer implementations escape the "/" character, including for instance PHP's json_encode(). However, JavaScript's own JSON.stringify() does not. If you look at the grammar on json.org, as I read it, the escaping of "/" is optional, since it is a valid UNICODE character, and it's not ", , or a control character. As much as possible, we want ECMAScript implementations to behave identically. For that reason, we specified specific outputs for JSON.stringify, even where the json refc grammar allowed variation. (For example, in theory every character could be stringified as a \uxxxx sequence). We can quibble about the specific encoding choices that were made by ES5 but unless somebody identifies something that is actually a bug according to the the json rfc grammar I doubt if we would make a change at this point.
That said, there is nothing stopping someone from defining and promoting for standardization an additional JSON encoding function that allowed more explicitly control of the encoding choices. Whether or not it got adopted would probably depend upon its demonstrated utility.
Allen
The .replace method on the result of .stringify is adequate for HTML or XML escapement. We don't need to change .stringify for this purpose.
It has recently been brought to my attention that a particular use case of JSON serialisation is to include JSON serialised content directly into an HTML file (inside a script tag). In this case in addition to the threat of strings being terminated by a double quote there's also the potential for the string "</script>" to terminate the JS source.
The request i received was to escape the slash character, which is allowed as input but per ES5 spec we aren't allowed to emit.
I will say that I don't really like this idea as it leads to "why not escape #?", etc but I thought I should bring this up on the list and see what others think.