Native base64 utility methods

# Mathias Bynens (12 years ago)

To convert from base64 to ASCII and vice versa, browsers have had global atob and btoa functions for a while now. At the moment, these are defined in the HTML standard: whatwg.org/html/webappapis.html#atob

However, such utility methods are not only useful in browsers. How about adding these as global functions to ECMAScript so that they’re natively available in all JavaScript engines, not just in browser environments?

To convert from base64 to ASCII and vice versa, browsers have had global `atob` and `btoa` functions for a while now. At the moment, these are defined in the HTML standard: http://whatwg.org/html/webappapis.html#atob

However, such utility methods are not only useful in browsers. How about adding these as global functions to ECMAScript so that they’re natively available in all JavaScript engines, not just in browser environments?

# Andrea Giammarchi (12 years ago)

+1 and as generic global utility it would be also nice to make it compatible with all strings.

I have this good old object: devpro.it/code/214.html

that indeed needs to normalize before encoding or decoding or errors might happen quite frequently in user-land:

{ atob: atob, btoa: btoa, fromCharCode: fromCharCode, encode: function encode(string) { return btoa(unescape(encodeURIComponent(string))); }, decode: function decode(string) { return decodeURIComponent(escape(atob(string))); }, toDataURL: function toDataURL(string, mime) { return "data:" + (mime ? mime + ";" : "") + base + base64.encode(string); }, fromDataURL: function fromDataURL(string) { return base64.decode(string.slice(string.indexOf(base) + base.length)); } }

Take care

+1 and as generic global utility it would be also nice to make it
compatible with all strings.

I have this good old object: http://devpro.it/code/214.html

that indeed needs to normalize before encoding or decoding or errors might
happen quite frequently in user-land:

{
        atob: atob,
        btoa: btoa,
        fromCharCode: fromCharCode,
        encode: function encode(string) {
            return btoa(unescape(encodeURIComponent(string)));
        },
        decode: function decode(string) {
            return decodeURIComponent(escape(atob(string)));
        },
        toDataURL: function toDataURL(string, mime) {
            return "data:" + (mime ? mime + ";" : "") + base +
base64.encode(string);
        },
        fromDataURL: function fromDataURL(string) {
            return base64.decode(string.slice(string.indexOf(base) +
base.length));
        }
    }


Take care



On Sun, May 4, 2014 at 2:16 PM, Mathias Bynens <mathias at qiwi.be> wrote:

> To convert from base64 to ASCII and vice versa, browsers have had global
> `atob` and `btoa` functions for a while now. At the moment, these are
> defined in the HTML standard: http://whatwg.org/html/webappapis.html#atob
>
> However, such utility methods are not only useful in browsers. How about
> adding these as global functions to ECMAScript so that they’re natively
> available in all JavaScript engines, not just in browser environments?
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20140504/cee39727/attachment.html>

# Mathias Bynens (12 years ago)

On 5 May 2014, at 00:00, Andrea Giammarchi <andrea.giammarchi at gmail.com> wrote:

as generic global utility it would be also nice to make it compatible with all strings.

For backwards compatibility reasons, atob/btoa should probably continue to work in exactly the same way they work now (i.e as per whatwg.org/html/webappapis.html#atob). Otherwise, any existing code that uses atob/btoa before UTF-8-decoding or after UTF-8-encoding, including your snippet, would suddenly break.

Like you demonstrated, it’s easy enough to encode or decode the input using UTF-8 or any other character encoding before passing to atob/btoa. (E.g. mothereff.in/base64)

On 5 May 2014, at 00:00, Andrea Giammarchi <andrea.giammarchi at gmail.com> wrote:

> as generic global utility it would be also nice to make it compatible with all strings.

For backwards compatibility reasons, `atob`/`btoa` should probably continue to work in exactly the same way they work now (i.e as per http://whatwg.org/html/webappapis.html#atob). Otherwise, any existing code that uses `atob`/`btoa` before UTF-8-decoding or after UTF-8-encoding, including your snippet, would suddenly break.

Like you demonstrated, it’s easy enough to encode or decode the input using UTF-8 or any other character encoding before passing to `atob`/`btoa`. (E.g. http://mothereff.in/base64)

# Claude Pache (12 years ago)

I'd say that it is "hacky enough" rather than "easy enough" to encode or decode the input using UTF-8. In my view, if atob and btoa were to enter in ES, it should be in Appendix B (the deprecated legacy features of web browsers), where it would be in good company with the other utility that does an implicit confusion between binary and ISO-8859-1-encoded strings, namely escape/unescape. We should be able to define a better designed function (and with a less silly name, while we're at it).

Le 5 mai 2014 à 09:54, Mathias Bynens <mathias at qiwi.be> a écrit :

> On 5 May 2014, at 00:00, Andrea Giammarchi <andrea.giammarchi at gmail.com> wrote:
> 
>> as generic global utility it would be also nice to make it compatible with all strings.
> 
> For backwards compatibility reasons, `atob`/`btoa` should probably continue to work in exactly the same way they work now (i.e as per http://whatwg.org/html/webappapis.html#atob). Otherwise, any existing code that uses `atob`/`btoa` before UTF-8-decoding or after UTF-8-encoding, including your snippet, would suddenly break.
> 
> Like you demonstrated, it’s easy enough to encode or decode the input using UTF-8 or any other character encoding before passing to `atob`/`btoa`. (E.g. http://mothereff.in/base64)
> 

I'd say that it is "hacky enough" rather than "easy enough" to encode or decode the input using UTF-8. In my view, if `atob` and `btoa` were to enter in ES, it should be in Appendix B (the deprecated legacy features of web browsers), where it would be in good company with the other utility that does an implicit confusion between binary and ISO-8859-1-encoded strings, namely `escape/unescape`.  We should be able to define a better designed function (and with a less silly name, while we're at it).

—Claude

# Mathias Bynens (12 years ago)

On 5 May 2014, at 10:48, Claude Pache <claude.pache at gmail.com> wrote:

In my view, if atob and btoa were to enter in ES, it should be in Appendix B (the deprecated legacy features of web browsers), where it would be in good company with the other utility that does an implicit confusion between binary and ISO-8859-1-encoded strings, namely escape/unescape.

How do atob and btoa do any sort of implicit conversion between binary and any other encoding? Their behavior is well-defined, and they’re explicitly limited to extended ASCII.

I don’t think this is Annex B material regardless — this is not a legacy feature.

We should be able to define a better designed function (and with a less silly name, while we're at it).

That would kind of defeat the purpose IMHO. We’re stuck with atob/btoa anyway in browsers — adding yet another name for the same thing does not really help.

On 5 May 2014, at 10:48, Claude Pache <claude.pache at gmail.com> wrote:

> In my view, if `atob` and `btoa` were to enter in ES, it should be in Appendix B (the deprecated legacy features of web browsers), where it would be in good company with the other utility that does an implicit confusion between binary and ISO-8859-1-encoded strings, namely `escape/unescape`.

How do `atob` and `btoa` do any sort of implicit conversion between binary and any other encoding? Their behavior is well-defined, and they’re explicitly limited to extended ASCII.

I don’t think this is Annex B material regardless — this is not a legacy feature.

> We should be able to define a better designed function (and with a less silly name, while we're at it).

That would kind of defeat the purpose IMHO. We’re stuck with `atob`/`btoa` anyway in browsers — adding yet another name for the same thing does not really help.

# Claude Pache (12 years ago)

Le 5 mai 2014 à 12:03, Mathias Bynens <mathias at qiwi.be> a écrit :

How do atob and btoa do any sort of implicit conversion between binary and any other encoding? Their behavior is well-defined, and they’re explicitly limited to extended ASCII.

Here, your "extended ASCII" means more precisely "ISO-8859-1".

Base64 is defined as a binary-to-text encoding 1. In the definition of btoa and atob 2, "binary strings" (which does not exist natively in JS) are replaced by ES strings of code units between 0x0000 and 0x00FF. That is equivalent to interpret the binary string as an ISO-8859-1-encoded string, because U+0000 to U+00FF code points correspond exactly to the ISO-8859-1 code points.

As you know, on the web, it is nowadays more fashionable to use UTF-8 rather than ISO-8859-1. Moreover, there are probably applications that want a raw binary string instead of interpreting it via some character encoding. In both cases, atob and btoa are unsatisfactory.

That would kind of defeat the purpose IMHO. We’re stuck with atob/btoa anyway in browsers — adding yet another name for the same thing does not really help.

I meant, defining a better thing, not the same thing (somewhat like encodeURIComponent is not the same thing as escape).

Le 5 mai 2014 à 12:03, Mathias Bynens <mathias at qiwi.be> a écrit :

> On 5 May 2014, at 10:48, Claude Pache <claude.pache at gmail.com> wrote:
> 
>> In my view, if `atob` and `btoa` were to enter in ES, it should be in Appendix B (the deprecated legacy features of web browsers), where it would be in good company with the other utility that does an implicit confusion between binary and ISO-8859-1-encoded strings, namely `escape/unescape`.
> 
> How do `atob` and `btoa` do any sort of implicit conversion between binary and any other encoding? Their behavior is well-defined, and they’re explicitly limited to extended ASCII.

Here, your "extended ASCII" means more precisely "ISO-8859-1".

Base64 is defined as a binary-to-text encoding [1]. In the definition of `btoa` and `atob` [2], "binary strings" (which does not exist natively in JS) are replaced by ES strings of code units between 0x0000 and 0x00FF. That is equivalent to interpret the binary string as an ISO-8859-1-encoded string, because U+0000 to U+00FF code points correspond exactly to the ISO-8859-1 code points.

As you know, on the web, it is nowadays more fashionable to use UTF-8 rather than ISO-8859-1. Moreover, there are probably applications that want a raw binary string instead of interpreting it via some character encoding. In both cases, `atob` and `btoa` are unsatisfactory.

> 
> I don’t think this is Annex B material regardless — this is not a legacy feature.
> 
>> We should be able to define a better designed function (and with a less silly name, while we're at it).
> 
> That would kind of defeat the purpose IMHO. We’re stuck with `atob`/`btoa` anyway in browsers — adding yet another name for the same thing does not really help.

I meant, defining a better thing, not the same thing (somewhat like `encodeURIComponent` is not the same thing as `escape`).

—Claude

[1] https://en.wikipedia.org/wiki/Base64
[2] http://whatwg.org/html/webappapis.html#atob

# John Barton (12 years ago)

On Sun, May 4, 2014 at 3:00 PM, Andrea Giammarchi < andrea.giammarchi at gmail.com> wrote:

+1 and as generic global utility it would be also nice to make it compatible with all strings.

A language with modules does not need nor should it rely on stuff more favorite features onto global. We need standard modules for all new features. jjb

On Sun, May 4, 2014 at 3:00 PM, Andrea Giammarchi <
andrea.giammarchi at gmail.com> wrote:

> +1 and as generic global utility it would be also nice to make it
> compatible with all strings.
>

A language with modules does not need nor should it rely on stuff more
favorite features onto global.  We need standard modules for all new
features.
jjb
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20140505/1fa0f475/attachment.html>

# Andrea Giammarchi (12 years ago)

@john I don't really care about the namespace/module as long as this matter moves from W3C spec to ES one.

@mathias didn't mean to change atob and btoa rather add two extra methods such encode/decode for strings (could land without problems in the String.prototype, IMO) with "less silly names" whatever definition of silly we have ^_^

Also interesting the @claude info on ISO strings ... yes, any UTF-8 compatible support is what I meant, doing in JS land unescape(encodeURIComponent(str)) feels very hacky, and it's slow, indeed

take care

@john I don't really care about the namespace/module as long as this matter
moves from W3C spec to ES one.

@mathias didn't mean to change atob and btoa rather add two extra methods
such encode/decode for strings (could land without problems in the
String.prototype, IMO) with "less silly names" whatever definition of silly
we have ^_^

Also interesting the @claude info on ISO strings ... yes, any UTF-8
compatible support is what I meant, doing in JS land
unescape(encodeURIComponent(str)) feels very hacky, and it's slow, indeed

take care

On Mon, May 5, 2014 at 8:16 AM, John Barton <johnjbarton at google.com> wrote:

>
>
>
> On Sun, May 4, 2014 at 3:00 PM, Andrea Giammarchi <
> andrea.giammarchi at gmail.com> wrote:
>
>> +1 and as generic global utility it would be also nice to make it
>> compatible with all strings.
>>
>
> A language with modules does not need nor should it rely on stuff more
> favorite features onto global.  We need standard modules for all new
> features.
> jjb
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20140505/337ce428/attachment.html>

# Florian Bösch (12 years ago)

I'd like highlight the fact that binary data handling in JS these days is mainly done via ArrayBuffers and TypedArrayViews. To that end, I've written a base64 to Uint8Array decoder like so: gist.github.com/pyalot/4530137

I don't quite see how atob/btoa without a usable binary type (indexable by byte, get the byte values out) should work.

I'd like highlight the fact that binary data handling in JS these days is
mainly done via ArrayBuffers and TypedArrayViews. To that end, I've written
a base64 to Uint8Array decoder like so:
https://gist.github.com/pyalot/4530137

I don't quite see how atob/btoa without a usable binary type (indexable by
byte, get the byte values out) should work.


On Mon, May 5, 2014 at 8:22 PM, Andrea Giammarchi <
andrea.giammarchi at gmail.com> wrote:

> @john I don't really care about the namespace/module as long as this
> matter moves from W3C spec to ES one.
>
> @mathias didn't mean to change atob and btoa rather add two extra methods
> such encode/decode for strings (could land without problems in the
> String.prototype, IMO) with "less silly names" whatever definition of silly
> we have ^_^
>
> Also interesting the @claude info on ISO strings ... yes, any UTF-8
> compatible support is what I meant, doing in JS land
> unescape(encodeURIComponent(str)) feels very hacky, and it's slow, indeed
>
> take care
>
>
> On Mon, May 5, 2014 at 8:16 AM, John Barton <johnjbarton at google.com>wrote:
>
>>
>>
>>
>> On Sun, May 4, 2014 at 3:00 PM, Andrea Giammarchi <
>> andrea.giammarchi at gmail.com> wrote:
>>
>>> +1 and as generic global utility it would be also nice to make it
>>> compatible with all strings.
>>>
>>
>> A language with modules does not need nor should it rely on stuff more
>> favorite features onto global.  We need standard modules for all new
>> features.
>> jjb
>>
>
>
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20140505/ecd5d25e/attachment.html>

# Mathias Bynens (12 years ago)

On 5 May 2014, at 20:22, Andrea Giammarchi <andrea.giammarchi at gmail.com> wrote:

@mathias didn't mean to change atob and btoa rather add two extra methods such encode/decode for strings (could land without problems in the String.prototype, IMO) with "less silly names" whatever definition of silly we have ^_^

Agreed. Moving TextEncoder/TextDecoder to ES would be nice (but it requires ArrayBuffer / Uint8Array). encoding.spec.whatwg.org/#api

On 5 May 2014, at 20:22, Andrea Giammarchi <andrea.giammarchi at gmail.com> wrote:

> @mathias didn't mean to change atob and btoa rather add two extra methods such encode/decode for strings (could land without problems in the String.prototype, IMO) with "less silly names" whatever definition of silly we have ^_^

Agreed. Moving `TextEncoder`/`TextDecoder` to ES would be nice (but it requires `ArrayBuffer` / `Uint8Array`). http://encoding.spec.whatwg.org/#api

# Brendan Eich (12 years ago)

Mathias Bynens wrote:

(but it requires ArrayBuffer / Uint8Array).

In ES6, so no problem proposing Text{En,De}coder for ES7.

Mathias Bynens wrote:
>   (but it requires `ArrayBuffer` / `Uint8Array`).

In ES6, so no problem proposing Text{En,De}coder for ES7.

/be

# Alexandre Morgaut (12 years ago)

I give a +1 for native support of base64 encoding/decoding in ES

I actually did a basic polyfill like module to add support of the standard W3C atob() / btoa() API over browsers (they don't all support it) and servers (mainly tested on Wakanda and Node.js)

My main concerns were, as already mentioned in this thread:

limitation to ISO strings
explicit exclusion of binary support by the current specification (must throw an error)

I think there is 3 approaches for such API:

2 raw global methods as atob() and btoa(), potentially base64Encode() and base64Decode() or whatever
2 methods of a dedicated namespace as JSON.stringify() / JSON.parse(), potentially Base64.stringify() / Base64.parse()
Dedicated prototype methods (base64Decode/base64Encode or fromBase64/toBase64) to each concerned constructors like: -> String.prototype -> Typed Array prototypes

I'm not fan of the first option (2 raw global methods) I'll use the second option in a below but think the third one may be easier to extend binary format supports

In a perfect world I'd love to see something simple similar to (use the namespace approach in this sample):

Base64.stringify( any ) -> return a string

Base64.parse( string, format ) -> return expected instance object

where format could be either depending of this working group choice:

a string specifying the expected returned format (ex: "Uint8Array", "Uint16Array"...)
a reference to the constructor of the expected format (ex: Uint8Array)

my personal choice would be for the constructor reference for 2 reasons:

while coding, the developer would have native autocompletion to fill this parameter value
at runtime, an exception would be natively throw if the constructor is not supported by the environment (and then obviously by the parse() method)

As said, I care about extensibility of "binary format supports" Fact the Typed Array will be native in ES6 is awesome to have some formats available by default Still note that nowadays we also have:

Blob, File, data URL (formated in base64), ImageData (canvas) from HTML5 standards
Buffer in node.js, wakanda
ByteString & ByteArray supported by some CommonJS platforms and by CommonNode

A registerFormatHandler( encoder, decoder) method could then be a plus

On 5 mai 2014, at 21:24, Florian Bösch <pyalot at gmail.com<mailto:pyalot at gmail.com>> wrote:

I don't quite see how atob/btoa without a usable binary type (indexable by byte, get the byte values out) should work.

On Mon, May 5, 2014 at 8:22 PM, Andrea Giammarchi <andrea.giammarchi at gmail.com<mailto:andrea.giammarchi at gmail.com>> wrote: @john I don't really care about the namespace/module as long as this matter moves from W3C spec to ES one.

Also interesting the @claude info on ISO strings ... yes, any UTF-8 compatible support is what I meant, doing in JS land unescape(encodeURIComponent(str)) feels very hacky, and it's slow, indeed

take care

On Mon, May 5, 2014 at 8:16 AM, John Barton <johnjbarton at google.com<mailto:johnjbarton at google.com>> wrote:

On Sun, May 4, 2014 at 3:00 PM, Andrea Giammarchi <andrea.giammarchi at gmail.com<mailto:andrea.giammarchi at gmail.com>> wrote: +1 and as generic global utility it would be also nice to make it compatible with all strings.

A language with modules does not need nor should it rely on stuff more favorite features onto global. We need standard modules for all new features. jjb

[cid:a672ec.png at baf5850d.4983f3e1] Alexandre Morgaut Wakanda Community Manager Email : Alexandre.Morgaut at 4d.com<mailto:Alexandre.Morgaut at 4d.com>

Web : www.4D.com www.4D.com

4D SAS 60, rue d'Alsace 92110 Clichy - France Standard : +33 1 40 87 92 00

I give a +1 for native support of base64 encoding/decoding in ES

I actually did a basic polyfill like module to add support of the standard W3C atob() / btoa() API over browsers (they don't all support it) and servers (mainly tested on Wakanda and Node.js)

My main concerns were, as already mentioned in this thread:
- limitation to ISO strings
- explicit exclusion of binary support by the current specification (must throw an error)

I think there is 3 approaches for such API:
- 2 raw global methods as atob() and btoa(), potentially base64Encode() and base64Decode() or whatever
- 2 methods of a dedicated namespace as JSON.stringify() / JSON.parse(), potentially Base64.stringify() / Base64.parse()
- Dedicated prototype methods (base64Decode/base64Encode or fromBase64/toBase64) to each concerned constructors like:
-> String.prototype
-> Typed Array prototypes

I'm not fan of the first option (2 raw global methods)
I'll use the second option in a below but think the third one may be easier to extend binary format supports

In a perfect world I'd love to see something simple similar to (use the namespace approach in this sample):

Base64.stringify( any ) -> return a string
Base64.parse( string, format ) -> return expected instance object

where format could be either depending of this working group choice:
- a string specifying the expected returned format (ex: "Uint8Array", "Uint16Array"...)
- a reference to the constructor of the expected format (ex: Uint8Array)

my personal choice would be for the constructor reference for 2 reasons:
- while coding, the developer would have native autocompletion to fill this parameter value
- at runtime, an exception would be natively throw if the constructor is not supported by the environment (and then obviously by the parse() method)

As said, I care about extensibility of "binary format supports"
Fact the Typed Array will be native in ES6 is awesome to have some formats available by default
Still note that nowadays we also have:
- Blob, File, data URL (formated in base64), ImageData (canvas) from HTML5 standards
- Buffer in node.js, wakanda
- ByteString & ByteArray supported by some CommonJS platforms and by CommonNode

A registerFormatHandler( encoder, decoder) method could then be a plus


On 5 mai 2014, at 21:24, Florian Bösch <pyalot at gmail.com<mailto:pyalot at gmail.com>> wrote:

I'd like highlight the fact that binary data handling in JS these days is mainly done via ArrayBuffers and TypedArrayViews. To that end, I've written a base64 to Uint8Array decoder like so: https://gist.github.com/pyalot/4530137

I don't quite see how atob/btoa without a usable binary type (indexable by byte, get the byte values out) should work.


On Mon, May 5, 2014 at 8:22 PM, Andrea Giammarchi <andrea.giammarchi at gmail.com<mailto:andrea.giammarchi at gmail.com>> wrote:
@john I don't really care about the namespace/module as long as this matter moves from W3C spec to ES one.

@mathias didn't mean to change atob and btoa rather add two extra methods such encode/decode for strings (could land without problems in the String.prototype, IMO) with "less silly names" whatever definition of silly we have ^_^

Also interesting the @claude info on ISO strings ... yes, any UTF-8 compatible support is what I meant, doing in JS land unescape(encodeURIComponent(str)) feels very hacky, and it's slow, indeed

take care


On Mon, May 5, 2014 at 8:16 AM, John Barton <johnjbarton at google.com<mailto:johnjbarton at google.com>> wrote:



On Sun, May 4, 2014 at 3:00 PM, Andrea Giammarchi <andrea.giammarchi at gmail.com<mailto:andrea.giammarchi at gmail.com>> wrote:
+1 and as generic global utility it would be also nice to make it compatible with all strings.

A language with modules does not need nor should it rely on stuff more favorite features onto global.  We need standard modules for all new features.
jjb



[cid:a672ec.png at baf5850d.4983f3e1]
Alexandre Morgaut
Wakanda Community Manager
Email : Alexandre.Morgaut at 4d.com<mailto:Alexandre.Morgaut at 4d.com>
Web :   www.4D.com<http://www.4D.com>

4D SAS
60, rue d'Alsace
92110 Clichy - France
Standard :      +33 1 40 87 92 00



_______________________________________________
es-discuss mailing list
es-discuss at mozilla.org<mailto:es-discuss at mozilla.org>
https://mail.mozilla.org/listinfo/es-discuss


_______________________________________________
es-discuss mailing list
es-discuss at mozilla.org<mailto:es-discuss at mozilla.org>
https://mail.mozilla.org/listinfo/es-discuss



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20140516/991a6b35/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: a672ec.png
Type: image/png
Size: 4628 bytes
Desc: a672ec.png
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20140516/991a6b35/attachment-0001.png>