Native base64 utility methods

# Mathias Bynens (6 years ago)

To convert from base64 to ASCII and vice versa, browsers have had global atob and btoa functions for a while now. At the moment, these are defined in the HTML standard: whatwg.org/html/webappapis.html#atob

However, such utility methods are not only useful in browsers. How about adding these as global functions to ECMAScript so that they’re natively available in all JavaScript engines, not just in browser environments?

# Andrea Giammarchi (6 years ago)

+1 and as generic global utility it would be also nice to make it compatible with all strings.

I have this good old object: devpro.it/code/214.html

that indeed needs to normalize before encoding or decoding or errors might happen quite frequently in user-land:

{ atob: atob, btoa: btoa, fromCharCode: fromCharCode, encode: function encode(string) { return btoa(unescape(encodeURIComponent(string))); }, decode: function decode(string) { return decodeURIComponent(escape(atob(string))); }, toDataURL: function toDataURL(string, mime) { return "data:" + (mime ? mime + ";" : "") + base + base64.encode(string); }, fromDataURL: function fromDataURL(string) { return base64.decode(string.slice(string.indexOf(base) + base.length)); } }

Take care

# Mathias Bynens (6 years ago)

On 5 May 2014, at 00:00, Andrea Giammarchi <andrea.giammarchi at gmail.com> wrote:

as generic global utility it would be also nice to make it compatible with all strings.

For backwards compatibility reasons, atob/btoa should probably continue to work in exactly the same way they work now (i.e as per whatwg.org/html/webappapis.html#atob). Otherwise, any existing code that uses atob/btoa before UTF-8-decoding or after UTF-8-encoding, including your snippet, would suddenly break.

Like you demonstrated, it’s easy enough to encode or decode the input using UTF-8 or any other character encoding before passing to atob/btoa. (E.g. mothereff.in/base64)

# Claude Pache (6 years ago)

I'd say that it is "hacky enough" rather than "easy enough" to encode or decode the input using UTF-8. In my view, if atob and btoa were to enter in ES, it should be in Appendix B (the deprecated legacy features of web browsers), where it would be in good company with the other utility that does an implicit confusion between binary and ISO-8859-1-encoded strings, namely escape/unescape. We should be able to define a better designed function (and with a less silly name, while we're at it).

# Mathias Bynens (6 years ago)

On 5 May 2014, at 10:48, Claude Pache <claude.pache at gmail.com> wrote:

In my view, if atob and btoa were to enter in ES, it should be in Appendix B (the deprecated legacy features of web browsers), where it would be in good company with the other utility that does an implicit confusion between binary and ISO-8859-1-encoded strings, namely escape/unescape.

How do atob and btoa do any sort of implicit conversion between binary and any other encoding? Their behavior is well-defined, and they’re explicitly limited to extended ASCII.

I don’t think this is Annex B material regardless — this is not a legacy feature.

We should be able to define a better designed function (and with a less silly name, while we're at it).

That would kind of defeat the purpose IMHO. We’re stuck with atob/btoa anyway in browsers — adding yet another name for the same thing does not really help.

# Claude Pache (6 years ago)

Le 5 mai 2014 à 12:03, Mathias Bynens <mathias at qiwi.be> a écrit :

How do atob and btoa do any sort of implicit conversion between binary and any other encoding? Their behavior is well-defined, and they’re explicitly limited to extended ASCII.

Here, your "extended ASCII" means more precisely "ISO-8859-1".

Base64 is defined as a binary-to-text encoding 1. In the definition of btoa and atob 2, "binary strings" (which does not exist natively in JS) are replaced by ES strings of code units between 0x0000 and 0x00FF. That is equivalent to interpret the binary string as an ISO-8859-1-encoded string, because U+0000 to U+00FF code points correspond exactly to the ISO-8859-1 code points.

As you know, on the web, it is nowadays more fashionable to use UTF-8 rather than ISO-8859-1. Moreover, there are probably applications that want a raw binary string instead of interpreting it via some character encoding. In both cases, atob and btoa are unsatisfactory.

That would kind of defeat the purpose IMHO. We’re stuck with atob/btoa anyway in browsers — adding yet another name for the same thing does not really help.

I meant, defining a better thing, not the same thing (somewhat like encodeURIComponent is not the same thing as escape).

# John Barton (6 years ago)

On Sun, May 4, 2014 at 3:00 PM, Andrea Giammarchi < andrea.giammarchi at gmail.com> wrote:

+1 and as generic global utility it would be also nice to make it compatible with all strings.

A language with modules does not need nor should it rely on stuff more favorite features onto global. We need standard modules for all new features. jjb

# Andrea Giammarchi (6 years ago)

@john I don't really care about the namespace/module as long as this matter moves from W3C spec to ES one.

@mathias didn't mean to change atob and btoa rather add two extra methods such encode/decode for strings (could land without problems in the String.prototype, IMO) with "less silly names" whatever definition of silly we have ^_^

Also interesting the @claude info on ISO strings ... yes, any UTF-8 compatible support is what I meant, doing in JS land unescape(encodeURIComponent(str)) feels very hacky, and it's slow, indeed

take care

# Florian Bösch (6 years ago)

I'd like highlight the fact that binary data handling in JS these days is mainly done via ArrayBuffers and TypedArrayViews. To that end, I've written a base64 to Uint8Array decoder like so: gist.github.com/pyalot/4530137

I don't quite see how atob/btoa without a usable binary type (indexable by byte, get the byte values out) should work.

# Mathias Bynens (6 years ago)

On 5 May 2014, at 20:22, Andrea Giammarchi <andrea.giammarchi at gmail.com> wrote:

@mathias didn't mean to change atob and btoa rather add two extra methods such encode/decode for strings (could land without problems in the String.prototype, IMO) with "less silly names" whatever definition of silly we have ^_^

Agreed. Moving TextEncoder/TextDecoder to ES would be nice (but it requires ArrayBuffer / Uint8Array). encoding.spec.whatwg.org/#api

# Brendan Eich (6 years ago)

Mathias Bynens wrote:

(but it requires ArrayBuffer / Uint8Array).

In ES6, so no problem proposing Text{En,De}coder for ES7.

# Alexandre Morgaut (6 years ago)

I give a +1 for native support of base64 encoding/decoding in ES

I actually did a basic polyfill like module to add support of the standard W3C atob() / btoa() API over browsers (they don't all support it) and servers (mainly tested on Wakanda and Node.js)

My main concerns were, as already mentioned in this thread:

  • limitation to ISO strings
  • explicit exclusion of binary support by the current specification (must throw an error)

I think there is 3 approaches for such API:

  • 2 raw global methods as atob() and btoa(), potentially base64Encode() and base64Decode() or whatever
  • 2 methods of a dedicated namespace as JSON.stringify() / JSON.parse(), potentially Base64.stringify() / Base64.parse()
  • Dedicated prototype methods (base64Decode/base64Encode or fromBase64/toBase64) to each concerned constructors like: -> String.prototype -> Typed Array prototypes

I'm not fan of the first option (2 raw global methods) I'll use the second option in a below but think the third one may be easier to extend binary format supports

In a perfect world I'd love to see something simple similar to (use the namespace approach in this sample):

Base64.stringify( any ) -> return a string

Base64.parse( string, format ) -> return expected instance object

where format could be either depending of this working group choice:

  • a string specifying the expected returned format (ex: "Uint8Array", "Uint16Array"...)
  • a reference to the constructor of the expected format (ex: Uint8Array)

my personal choice would be for the constructor reference for 2 reasons:

  • while coding, the developer would have native autocompletion to fill this parameter value
  • at runtime, an exception would be natively throw if the constructor is not supported by the environment (and then obviously by the parse() method)

As said, I care about extensibility of "binary format supports" Fact the Typed Array will be native in ES6 is awesome to have some formats available by default Still note that nowadays we also have:

  • Blob, File, data URL (formated in base64), ImageData (canvas) from HTML5 standards
  • Buffer in node.js, wakanda
  • ByteString & ByteArray supported by some CommonJS platforms and by CommonNode

A registerFormatHandler( encoder, decoder) method could then be a plus

On 5 mai 2014, at 21:24, Florian Bösch <pyalot at gmail.com<mailto:pyalot at gmail.com>> wrote:

I'd like highlight the fact that binary data handling in JS these days is mainly done via ArrayBuffers and TypedArrayViews. To that end, I've written a base64 to Uint8Array decoder like so: gist.github.com/pyalot/4530137

I don't quite see how atob/btoa without a usable binary type (indexable by byte, get the byte values out) should work.

On Mon, May 5, 2014 at 8:22 PM, Andrea Giammarchi <andrea.giammarchi at gmail.com<mailto:andrea.giammarchi at gmail.com>> wrote: @john I don't really care about the namespace/module as long as this matter moves from W3C spec to ES one.

@mathias didn't mean to change atob and btoa rather add two extra methods such encode/decode for strings (could land without problems in the String.prototype, IMO) with "less silly names" whatever definition of silly we have ^_^

Also interesting the @claude info on ISO strings ... yes, any UTF-8 compatible support is what I meant, doing in JS land unescape(encodeURIComponent(str)) feels very hacky, and it's slow, indeed

take care

On Mon, May 5, 2014 at 8:16 AM, John Barton <johnjbarton at google.com<mailto:johnjbarton at google.com>> wrote:

On Sun, May 4, 2014 at 3:00 PM, Andrea Giammarchi <andrea.giammarchi at gmail.com<mailto:andrea.giammarchi at gmail.com>> wrote: +1 and as generic global utility it would be also nice to make it compatible with all strings.

A language with modules does not need nor should it rely on stuff more favorite features onto global. We need standard modules for all new features. jjb

[cid:a672ec.png at baf5850d.4983f3e1] Alexandre Morgaut Wakanda Community Manager Email : Alexandre.Morgaut at 4d.com<mailto:Alexandre.Morgaut at 4d.com>

Web : www.4D.comwww.4D.com

4D SAS 60, rue d'Alsace 92110 Clichy - France Standard : +33 1 40 87 92 00