Behavior of Decode with overlong utf-8

# James Graham (17 years ago)

Unless I am misreading the specification (quite likely), the Decode function does not have any logic to protect against decoding overlong, but otherwise valid, UTF-8 sequences. Arguably this fails in step 29 since RFC 3629[1] states:

"It is important to note that the rows of the table are mutually exclusive, i.e., there is only one valid way to encode a given character. [...] Implementations of the decoding algorithm above MUST protect against decoding invalid sequences"

but it is not clear how to handle a faliure here. Existing implementations seem to disagree on this point, my limited testing showed:

Spidermonkey: inserts a uFFFD replacement character Futhark: leaves the original percent-encoded characters Squirrelfish: Throws URIError V8: Decodes the overlong sequence IE/JScript: Decodes the overlong sequence

Since the usual behavior for invalid percent encoded sequences is to throw URIError, I suggest making that happen in this case too.

[1] www.rfc-editor.org/rfc/rfc3629.txt