Unless I am misreading the specification (quite likely), the Decode
function does not have any logic to protect against decoding overlong,
but otherwise valid, UTF-8 sequences. Arguably this fails in step 29
since RFC 3629[1] states:
"It is important to note that the rows of the table are mutually
exclusive, i.e., there is only one valid way to encode a given
character. [...] Implementations of the decoding algorithm above MUST
protect against decoding invalid sequences"
but it is not clear how to handle a faliure here. Existing
implementations seem to disagree on this point, my limited testing showed:
Spidermonkey: inserts a uFFFD replacement character
Futhark: leaves the original percent-encoded characters
Squirrelfish: Throws URIError
V8: Decodes the overlong sequence
IE/JScript: Decodes the overlong sequence
Since the usual behavior for invalid percent encoded sequences is to
throw URIError, I suggest making that happen in this case too.
Unless I am misreading the specification (quite likely), the Decode
function does not have any logic to protect against decoding overlong,
but otherwise valid, UTF-8 sequences. Arguably this fails in step 29
since RFC 3629[1] states:
"It is important to note that the rows of the table are mutually
exclusive, i.e., there is only one valid way to encode a given
character. [...] Implementations of the decoding algorithm above MUST
protect against decoding invalid sequences"
but it is not clear how to handle a faliure here. Existing
implementations seem to disagree on this point, my limited testing showed:
Spidermonkey: inserts a uFFFD replacement character
Futhark: leaves the original percent-encoded characters
Squirrelfish: Throws URIError
V8: Decodes the overlong sequence
IE/JScript: Decodes the overlong sequence
Since the usual behavior for invalid percent encoded sequences is to
throw URIError, I suggest making that happen in this case too.
[1] http://www.rfc-editor.org/rfc/rfc3629.txt
Unless I am misreading the specification (quite likely), the Decode function does not have any logic to protect against decoding overlong, but otherwise valid, UTF-8 sequences. Arguably this fails in step 29 since RFC 3629[1] states:
"It is important to note that the rows of the table are mutually exclusive, i.e., there is only one valid way to encode a given character. [...] Implementations of the decoding algorithm above MUST protect against decoding invalid sequences"
but it is not clear how to handle a faliure here. Existing implementations seem to disagree on this point, my limited testing showed:
Spidermonkey: inserts a uFFFD replacement character Futhark: leaves the original percent-encoded characters Squirrelfish: Throws URIError V8: Decodes the overlong sequence IE/JScript: Decodes the overlong sequence
Since the usual behavior for invalid percent encoded sequences is to throw URIError, I suggest making that happen in this case too.
[1] www.rfc-editor.org/rfc/rfc3629.txt