Native base64 utility methods
+1 and as generic global utility it would be also nice to make it compatible with all strings.
I have this good old object: devpro.it/code/214.html
that indeed needs to normalize before encoding or decoding or errors might happen quite frequently in user-land:
{ atob: atob, btoa: btoa, fromCharCode: fromCharCode, encode: function encode(string) { return btoa(unescape(encodeURIComponent(string))); }, decode: function decode(string) { return decodeURIComponent(escape(atob(string))); }, toDataURL: function toDataURL(string, mime) { return "data:" + (mime ? mime + ";" : "") + base + base64.encode(string); }, fromDataURL: function fromDataURL(string) { return base64.decode(string.slice(string.indexOf(base) + base.length)); } }
Take care
On 5 May 2014, at 00:00, Andrea Giammarchi <andrea.giammarchi at gmail.com> wrote:
as generic global utility it would be also nice to make it compatible with all strings.
For backwards compatibility reasons, atob
/btoa
should probably continue to work in exactly the same way they work now (i.e as per whatwg.org/html/webappapis.html#atob). Otherwise, any existing code that uses atob
/btoa
before UTF-8-decoding or after UTF-8-encoding, including your snippet, would suddenly break.
Like you demonstrated, it’s easy enough to encode or decode the input using UTF-8 or any other character encoding before passing to atob
/btoa
. (E.g. mothereff.in/base64)
I'd say that it is "hacky enough" rather than "easy enough" to encode or decode the input using UTF-8. In my view, if atob
and btoa
were to enter in ES, it should be in Appendix B (the deprecated legacy features of web browsers), where it would be in good company with the other utility that does an implicit confusion between binary and ISO-8859-1-encoded strings, namely escape/unescape
. We should be able to define a better designed function (and with a less silly name, while we're at it).
On 5 May 2014, at 10:48, Claude Pache <claude.pache at gmail.com> wrote:
In my view, if
atob
andbtoa
were to enter in ES, it should be in Appendix B (the deprecated legacy features of web browsers), where it would be in good company with the other utility that does an implicit confusion between binary and ISO-8859-1-encoded strings, namelyescape/unescape
.
How do atob
and btoa
do any sort of implicit conversion between binary and any other encoding? Their behavior is well-defined, and they’re explicitly limited to extended ASCII.
I don’t think this is Annex B material regardless — this is not a legacy feature.
We should be able to define a better designed function (and with a less silly name, while we're at it).
That would kind of defeat the purpose IMHO. We’re stuck with atob
/btoa
anyway in browsers — adding yet another name for the same thing does not really help.
Le 5 mai 2014 à 12:03, Mathias Bynens <mathias at qiwi.be> a écrit :
How do
atob
andbtoa
do any sort of implicit conversion between binary and any other encoding? Their behavior is well-defined, and they’re explicitly limited to extended ASCII.
Here, your "extended ASCII" means more precisely "ISO-8859-1".
Base64 is defined as a binary-to-text encoding 1. In the definition of btoa
and atob
2, "binary strings" (which does not exist natively in JS) are replaced by ES strings of code units between 0x0000 and 0x00FF. That is equivalent to interpret the binary string as an ISO-8859-1-encoded string, because U+0000 to U+00FF code points correspond exactly to the ISO-8859-1 code points.
As you know, on the web, it is nowadays more fashionable to use UTF-8 rather than ISO-8859-1. Moreover, there are probably applications that want a raw binary string instead of interpreting it via some character encoding. In both cases, atob
and btoa
are unsatisfactory.
That would kind of defeat the purpose IMHO. We’re stuck with
atob
/btoa
anyway in browsers — adding yet another name for the same thing does not really help.
I meant, defining a better thing, not the same thing (somewhat like encodeURIComponent
is not the same thing as escape
).
On Sun, May 4, 2014 at 3:00 PM, Andrea Giammarchi < andrea.giammarchi at gmail.com> wrote:
+1 and as generic global utility it would be also nice to make it compatible with all strings.
A language with modules does not need nor should it rely on stuff more favorite features onto global. We need standard modules for all new features. jjb
@john I don't really care about the namespace/module as long as this matter moves from W3C spec to ES one.
@mathias didn't mean to change atob and btoa rather add two extra methods such encode/decode for strings (could land without problems in the String.prototype, IMO) with "less silly names" whatever definition of silly we have ^_^
Also interesting the @claude info on ISO strings ... yes, any UTF-8 compatible support is what I meant, doing in JS land unescape(encodeURIComponent(str)) feels very hacky, and it's slow, indeed
take care
I'd like highlight the fact that binary data handling in JS these days is mainly done via ArrayBuffers and TypedArrayViews. To that end, I've written a base64 to Uint8Array decoder like so: gist.github.com/pyalot/4530137
I don't quite see how atob/btoa without a usable binary type (indexable by byte, get the byte values out) should work.
On 5 May 2014, at 20:22, Andrea Giammarchi <andrea.giammarchi at gmail.com> wrote:
@mathias didn't mean to change atob and btoa rather add two extra methods such encode/decode for strings (could land without problems in the String.prototype, IMO) with "less silly names" whatever definition of silly we have ^_^
Agreed. Moving TextEncoder
/TextDecoder
to ES would be nice (but it requires ArrayBuffer
/ Uint8Array
). encoding.spec.whatwg.org/#api
Mathias Bynens wrote:
(but it requires
ArrayBuffer
/Uint8Array
).
In ES6, so no problem proposing Text{En,De}coder for ES7.
I give a +1 for native support of base64 encoding/decoding in ES
I actually did a basic polyfill like module to add support of the standard W3C atob() / btoa() API over browsers (they don't all support it) and servers (mainly tested on Wakanda and Node.js)
My main concerns were, as already mentioned in this thread:
- limitation to ISO strings
- explicit exclusion of binary support by the current specification (must throw an error)
I think there is 3 approaches for such API:
- 2 raw global methods as atob() and btoa(), potentially base64Encode() and base64Decode() or whatever
- 2 methods of a dedicated namespace as JSON.stringify() / JSON.parse(), potentially Base64.stringify() / Base64.parse()
- Dedicated prototype methods (base64Decode/base64Encode or fromBase64/toBase64) to each concerned constructors like: -> String.prototype -> Typed Array prototypes
I'm not fan of the first option (2 raw global methods) I'll use the second option in a below but think the third one may be easier to extend binary format supports
In a perfect world I'd love to see something simple similar to (use the namespace approach in this sample):
Base64.stringify( any ) -> return a string
Base64.parse( string, format ) -> return expected instance object
where format could be either depending of this working group choice:
- a string specifying the expected returned format (ex: "Uint8Array", "Uint16Array"...)
- a reference to the constructor of the expected format (ex: Uint8Array)
my personal choice would be for the constructor reference for 2 reasons:
- while coding, the developer would have native autocompletion to fill this parameter value
- at runtime, an exception would be natively throw if the constructor is not supported by the environment (and then obviously by the parse() method)
As said, I care about extensibility of "binary format supports" Fact the Typed Array will be native in ES6 is awesome to have some formats available by default Still note that nowadays we also have:
- Blob, File, data URL (formated in base64), ImageData (canvas) from HTML5 standards
- Buffer in node.js, wakanda
- ByteString & ByteArray supported by some CommonJS platforms and by CommonNode
A registerFormatHandler( encoder, decoder) method could then be a plus
On 5 mai 2014, at 21:24, Florian Bösch <pyalot at gmail.com<mailto:pyalot at gmail.com>> wrote:
I'd like highlight the fact that binary data handling in JS these days is mainly done via ArrayBuffers and TypedArrayViews. To that end, I've written a base64 to Uint8Array decoder like so: gist.github.com/pyalot/4530137
I don't quite see how atob/btoa without a usable binary type (indexable by byte, get the byte values out) should work.
On Mon, May 5, 2014 at 8:22 PM, Andrea Giammarchi <andrea.giammarchi at gmail.com<mailto:andrea.giammarchi at gmail.com>> wrote: @john I don't really care about the namespace/module as long as this matter moves from W3C spec to ES one.
@mathias didn't mean to change atob and btoa rather add two extra methods such encode/decode for strings (could land without problems in the String.prototype, IMO) with "less silly names" whatever definition of silly we have ^_^
Also interesting the @claude info on ISO strings ... yes, any UTF-8 compatible support is what I meant, doing in JS land unescape(encodeURIComponent(str)) feels very hacky, and it's slow, indeed
take care
On Mon, May 5, 2014 at 8:16 AM, John Barton <johnjbarton at google.com<mailto:johnjbarton at google.com>> wrote:
On Sun, May 4, 2014 at 3:00 PM, Andrea Giammarchi <andrea.giammarchi at gmail.com<mailto:andrea.giammarchi at gmail.com>> wrote: +1 and as generic global utility it would be also nice to make it compatible with all strings.
A language with modules does not need nor should it rely on stuff more favorite features onto global. We need standard modules for all new features. jjb
[cid:a672ec.png at baf5850d.4983f3e1] Alexandre Morgaut Wakanda Community Manager Email : Alexandre.Morgaut at 4d.com<mailto:Alexandre.Morgaut at 4d.com>
Web : www.4D.comwww.4D.com
4D SAS 60, rue d'Alsace 92110 Clichy - France Standard : +33 1 40 87 92 00
To convert from base64 to ASCII and vice versa, browsers have had global
atob
andbtoa
functions for a while now. At the moment, these are defined in the HTML standard: whatwg.org/html/webappapis.html#atobHowever, such utility methods are not only useful in browsers. How about adding these as global functions to ECMAScript so that they’re natively available in all JavaScript engines, not just in browser environments?