Overload str.replace to take a Map?

# Alex Vincent (6 years ago)

Reading [1] in the digests, I think there might actually be an API improvement that is doable.

Suppose the String.prototype.replace API allowed passing in a single argument, a Map instance where the keys were strings or regular expressions and the values were replacement strings or functions.

Advantages:

  • Shorthand - instead of writing str.replace(a, b).replace(c, d).replace(e, f)... you get str.replace(regExpMap)
  • Reusable - the same regular expression/string map could be used for several strings (assuming of course the user didn't just abstract the call into a separate function)
  • Modifiable on demand - developers could easily add new regular expression matches to the map object, or remove them
  • It wouldn't necessarily break existing API, since String.prototype.replace currently accepts only RegExp or strings.

Disadvantages / reasons not to do it:

  • Detecting collisions between matching regular expressions or strings. If two regular expressions match the same string, or a regular expression and a search string match, the expected results may vary because a Map's elements might not be consistently ordered. I don't know if the ECMAScript spec mandates preserving a particular order to a Map's elements.
    • if we preserve the same chaining capability (str.replace(map1).replace(map2)...), this might not be a big problem.

The question is, how often do people chain replace calls together?

  • It's not particularly hard to chain several replace calls together. It's just verbose, which might not be a high enough burden to overcome for adding API.

That's my two cents for the day. Thoughts?

[1] esdiscuss.org/topic/adding

# Logan Smyth (6 years ago)

It wouldn't necessarily break existing API, since

String.prototype.replace currently accepts only RegExp or strings.

Not quite accurate. It accepts anything with a Symbol.replace property, or a string.

Given that, what you're describing can be implemented as

Map.prototype[Symbol.replace] = function(str) {
  for(const [key, value] of this) {
    str = str.replace(key, value);
  }
  return str;
};

I don't know if the ECMAScript spec mandates preserving a particular

order to a Map's elements.

It does, so you're good there.

Detecting collisions between matching regular expressions or strings.

I think this would be my primary concern, but no so much ordering as expectations. Like if you did

"1".replace(new Map([
  ['1', '2'],
  ['2', '3],
]);

is the result 2 or 3? 3 seems surprising to me, at least in the general sense, because there was no 2 in the original input, but it's also hard to see how you'd spec the behavior to avoid that if general regex replacement is supported.

# Isiah Meadows (6 years ago)

Here's what I'd prefer instead: overload String.prototype.replace to take non-callable objects, as sugar for this:

const old = Function.call.bind(Function.call, String.prototype.replace)
String.prototype.replace = function (regexp, object) {
    if (object == null && regexp != null && typeof regexp === "object") {
        const re = new RegExp(
            Object.keys(regexp)
            .map(key => `${old(key, /[\\^$*+?.()|[\]{}]/g, '\\$&')}`)
            .join("|")
        )
        return old(this, re, m => object[m])
    } else {
        return old(this, regexp, object)
    }
}

This would cover about 99% of my use for something like this, with less runtime overhead (that of not needing to check for and potentially match multiple regular expressions at runtime) and better static analyzability (you only need to check it's an object literal or constant frozen object, not that it's argument is the result of the built-in Map call). It's exceptionally difficult to optimize for this unless you know everything's a string, but most cases where I had to pass a callback that wasn't super complex looked a lot like this:

// What I use:
function escapeHTML(str) {
    return str.replace(/["'&<>]/g, m => {
        switch (m) {
        case '"': return """
        case "'": return "'"
        case "&": return "&"
        case "<": return "<"
        case ">": return ">"
        default: throw new TypeError("unreachable")
        }
    })
}

// What it could be
function escapeHTML(str) {
    return str.replace({
        '"': """,
        "'": "'",
        "&": "&",
        "<": "<",
        ">": ">",
    })
}

And yes, this enables optimizations engines couldn't easily produce otherwise. In this instance, an engine could find that the object is static with only single-character entries, and it could replace the call to a fast-path one that relies on a cheap lookup table instead (Unicode replacement would be similar, except you'd need an extra layer of indirection with astrals to avoid blowing up memory when generating these tables):

// Original
function escapeHTML(str) {
    return str.replace({
        '"': """,
        "'": "'",
        "&": "&",
        "<": "<",
        ">": ">",
    })
}

// Not real JS, but think of it as how an engine might implement this. The
// implementation of the runtime function `ReplaceWithLookupTable` is omitted
// for brevity, but you could imagine how it could be implemented, given the
// pseudo-TS signature:
//
// ```ts
// declare function %ReplaceWithLookupTable(
//     str: string,
//     table: string[]
// ): string
// ```
function escapeHTML(str) {
    static {
        // A zero-initialized array with 2^16 entries (U+0000-U+FFFF), except
        // for the object's members. This takes up to about 70K per instance,
        // but these are *far* more often called than created.
        const _lookup_escapeHTML = %calloc(65536)

        _lookup_escapeHTML[34] = """
        _lookup_escapeHTML[38] = "&"
        _lookup_escapeHTML[39] = "'"
        _lookup_escapeHTML[60] = ">"
        _lookup_escapeHTML[62] = "<"
    }

    return %ReplaceWithLookupTable(str, _lookup_escapeHTML)
}

Likewise, similar, but more restrained, optimizations could be performed on objects with multibyte strings, since they can be reduced to a simple search trie. (These can be built in even the general case if the strings are large enough to merit it - small ropes are pretty cheap to create.)

For what it's worth, there's precedent here in Ruby, which has support for Hashes as String#gsub parameters which work similarly.


Isiah Meadows me at isiahmeadows.com, www.isiahmeadows.com

# Cyril Auburtin (6 years ago)

You can also have a

var replacer = replacements => {
  const re = new RegExp(replacements.map(([k,_,escaped=k]) => escaped).join('|'), 'gu');
  const replaceMap = new Map(replacements);
  return s => s.replace(re, w => replaceMap.get(w));
}
var replace = replacer([
  ['$', '^', String.raw`\$`],
  ['1', '2'],
  ['<', '&lt;'], 
  ['🍌', '🍑'],
  ['-', '_'],
  [']', '@', String.raw`\]`]
]);
replace('test🍐🍌-$$[11] <foo>') // "test🍐🍑_^^[22@ &lt;foo>"

but it's quickly messy to work with escaping

Le sam. 19 mai 2018 à 08:17, Isiah Meadows <isiahmeadows at gmail.com> a écrit :

# kai zhu (6 years ago)

again, you backend-engineers are making something more complicated than needs be, when simple, throwaway glue-code will suffice. agree with jordan, this feature is a needless cross-cut of String.prototype.replace.

/*jslint
    node: true
*/
'use strict';
var dict;
dict = {
    '$': '^',
    '1': '2',
    '<': '<',
    '🍌': '🍑',
    '-': '_',
    ']': '@'
};
// output: "test🍐🍑_^^[22@ <foo>"

console.log('test🍐🍌-$$[11] <foo>'.replace((/[\S\s]/gu), function (character) {
    return dict.hasOwnProperty(character)
        ? dict[character]
        : character;
}));

kai zhu kaizhu256 at gmail.com

# Mathias Bynens (6 years ago)

Hey Kai, you’re oversimplifying. Your solution works for a single Unicode symbol (corresponding to a single code point) but falls apart as soon as you need to match multiple symbols of possibly varying length, like in the escapeHtml example.

# Isiah Meadows (6 years ago)

@Mathias

My partcular escapeHTML example could be written like that (and it is somewhat in the prose). But you're right that in the prose, I did bring up the potential for things like str.replace({cheese: "cake", ham: "eggs"}).

@Kai

Have you ever tried writing an HTML template system on the front end? This will almost inevitably come up, and most of my use cases for this is on the front end itself handling various scenarios.

@Cyril

And every single one of those patterns is going to need compiled and executed, and compiling and interpreting regular expressions is definitely not quick, especially when you can nest Kleene stars. (See: en.wikipedia.org/wiki/Regular_expression#Implementations_and_running_times) That's why I'm against it - we don't need to complicate this proposal with that mess.


Isiah Meadows me at isiahmeadows.com, www.isiahmeadows.com

# kai zhu (6 years ago)

@Kai Have you ever tried writing an HTML template system on the front end? This will almost inevitably come up, and most of my use cases for this is on the front end itself handling various scenarios.

i have. if we want to move from toy-cases to real-world frontend-examples [1] [2] [3], here's a zero-dependency, mustache-based template-system in under 110 sloc, which i've been using for the past 5 years. and the trick to simplify rendering-of-partials, is to recurse them inside string.replace (see the red-highlighted sections of code).

[1] standalone, static-function templateRender kaizhu256/node-utility2/blob/2018.1.13/lib.utility2.js#L5922, kaizhu256/node-utility2/blob/2018.1.13/lib.utility2.js#L5922 [2] test-cases showing capabilities of templateRender kaizhu256/node-utility2/blob/2018.1.13/test.js#L1411, kaizhu256/node-utility2/blob/2018.1.13/test.js#L1411 [3] live website rendered using templateRender kaizhu256.github.io/node-swgg-wechat-pay/build..beta..travis-ci.org/app, kaizhu256.github.io/node-swgg-wechat-pay/build..beta..travis-ci.org/app/#!swgg_id__2Fpay_2Fcloseorder_20POST_1

/*
 * example.js
 *
 * this zero-dependency, standalone program will render mustache-based html-templates,
 * with the given dictionary, and print it to stdout
 * code derived from https://github.com/kaizhu256/node-utility2/blob/2018.1.13/lib.utility2.js#L5922 <https://github.com/kaizhu256/node-utility2/blob/2018.1.13/lib.utility2.js#L5922>



 * example usage:
$ node example.js <template> <json-dictionary>
$ node example.js '<pre>

JSON.stringify("<b>hello world!</b>".toUpperCase()))=
{{hello.world toUpperCase jsonStringify}}
</pre>

<ul>
{{#each myList}}
{{#if href}}
<li id="{{href encodeURIComponent}}">
    <a href="{{href}}">
    {{#if description}}
    {{description notHtmlSafe}}
    {{#unless description}}
    no description
    {{/if description}}
    </a>
</li>
{{/if href}}
{{/each myList}}
</ul>' '{
    "hello": {
        "world": "<b>hello world!</b>"
    },
    "myList": [
        null,
        {
            "href": "https://www.example.com/1",
            "description": "<b>click here!</b>"
        },
        {
            "href": "https://www.example.com/2"
        }
    ]
}'



 * example output:
<pre>

JSON.stringify("<b>hello world!</b>".toUpperCase()))=
"<B>HELLO WORLD!</B>"
</pre>

<ul>

<li id="https%3A%2F%2Fwww.example.com%2F1">
    <a href="https://www.example.com/1">

    <b>click here!</b>

    </a>
</li>

<li id="https%3A%2F%2Fwww.example.com%2F2">
    <a href="https://www.example.com/2">

    no description

    </a>
</li>

</ul>
 */







/*jslint
    node: true,
    regexp: true
*/
'use strict';
var templateRender;
templateRender = function (template, dict, notHtmlSafe) {
/*
 * this function will render the template with the given dict
 */
    var argList, getValue, match, renderPartial, rgx, skip, value;
    dict = dict || {};
    getValue = function (key) {
        argList = key.split(' ');
        value = dict;
        if (argList[0] === '#this/') {
            return;
        }
        // iteratively lookup nested values in the dict
        argList[0].split('.').forEach(function (key) {
            value = value && value[key];
        });
        return value;
    };
    renderPartial = function (match0, helper, key, partial) {
        switch (helper) {
        case 'each':
        case 'eachTrimRightComma':
            value = getValue(key);
            value = Array.isArray(value)
                ? value.map(function (dict) {
                    // recurse with partial
                    return templateRender(partial, dict, notHtmlSafe);
                }).join('')
                : '';
            // remove trailing-comma from last element
            if (helper === 'eachTrimRightComma') {
                value = value.trimRight().replace((/,$/), '');
            }
            return value;
        case 'if':
            partial = partial.split('{{#unless ' + key + '}}');
            partial = getValue(key)
                ? partial[0]
                // handle 'unless' case
                : partial.slice(1).join('{{#unless ' + key + '}}');
            // recurse with partial
            return templateRender(partial, dict, notHtmlSafe);
        case 'unless':
            return getValue(key)
                ? ''
                // recurse with partial
                : templateRender(partial, dict, notHtmlSafe);
        default:
            // recurse with partial
            return match0[0] + templateRender(match0.slice(1), dict, notHtmlSafe);
        }
    };
    // render partials
    rgx = (/\{\{#(\w+) ([^}]+?)\}\}/g);
    template = template || '';
    for (match = rgx.exec(template); match; match = rgx.exec(template)) {
        rgx.lastIndex += 1 - match[0].length;
        template = template.replace(
            new RegExp('\\{\\{#(' + match[1] + ') (' + match[2] +
                ')\\}\\}([\\S\\s]*?)\\{\\{/' + match[1] + ' ' + match[2] +
                '\\}\\}'),
            renderPartial
        );
    }
    // search for keys in the template
    return template.replace((/\{\{[^}]+?\}\}/g), function (match0) {
        getValue(match0.slice(2, -2));
        if (value === undefined) {
            return match0;
        }
        argList.slice(1).forEach(function (arg0, ii, list) {
            switch (arg0) {
            case 'alphanumeric':
                value = value.replace((/\W/g), '_');
                break;
            case 'decodeURIComponent':
                value = decodeURIComponent(value);
                break;
            case 'encodeURIComponent':
                value = encodeURIComponent(value);
                break;
            case 'jsonStringify':
                value = JSON.stringify(value);
                break;
            case 'jsonStringify4':
                value = JSON.stringify(value, null, 4);
                break;
            case 'notHtmlSafe':
                notHtmlSafe = true;
                break;
            case 'truncate':
                skip = ii + 1;
                if (value.length > list[skip]) {
                    value = value.slice(0, list[skip] - 3).trimRight() + '...';
                }
                break;
            // default to String.prototype[arg0]()
            default:
                if (ii === skip) {
                    break;
                }
                value = value[arg0]();
                break;
            }
        });
        value = String(value);
        // default to htmlSafe
        if (!notHtmlSafe) {
            value = value
                .replace((/"/g), '"')
                .replace((/&/g), '&')
                .replace((/'/g), ''')
                .replace((/</g), '<')
                .replace((/>/g), '>')
                .replace((/&(amp;|apos;|gt;|lt;|quot;)/ig), '&$1');
        }
        return value;
    });
};

console.log(templateRender(process.argv[2], JSON.parse(process.argv[3])));

kai zhu kaizhu256 at gmail.com

# kai zhu (6 years ago)

sorry, there was a bug in the standalone-solution i last posted. here’s corrected version ^^;;;

also highlighted in blue, the escapeHTML part of code relevant to this discussion. and honestly, replacing those 6 blue-lines-of-code in this real-world example, with the proposed map-replace doesn’t make much of a difference in terms of overall readability/maintainability.

/*
 * example.js
 *
 * this zero-dependency, standalone program will render mustache-based html-templates,
 * with the given dictionary, and print it to stdout
 * code derived from https://github.com/kaizhu256/node-utility2/blob/2018.1.13/lib.utility2.js#L5922



 * example usage:
$ node example.js <template> <json-dictionary>
$ node example.js '<pre>

JSON.stringify("<b>hello world!</b>".toUpperCase()))=
{{hello.world toUpperCase jsonStringify}}
</pre>

<ul>
{{#each myList}}
{{#if href}}
<li id="{{href encodeURIComponent}}">
    <a href="{{href}}">
    {{#if description}}
    {{description notHtmlSafe}}
    {{#unless description}}
    no description
    {{/if description}}
    </a>
</li>
{{/if href}}
{{/each myList}}
</ul>' '{
    "hello": {
        "world": "<b>hello world!</b>"
    },
    "myList": [
        null,
        {
            "href": "https://www.example.com/1",
            "description": "<b>click here!</b>"
        },
        {
            "href": "https://www.example.com/2"
        }
    ]
}'



 * example output:
<pre>

JSON.stringify("<b>hello world!</b>".toUpperCase()))=
"<B>HELLO WORLD!</B>"
</pre>

<ul>

<li id="https%3A%2F%2Fwww.example.com%2F1">
    <a href="https://www.example.com/1">

    <b>click here!</b>

    </a>
</li>

<li id="https%3A%2F%2Fwww.example.com%2F2">
    <a href="https://www.example.com/2">

    no description

    </a>
</li>

</ul>
 */







/*jslint
    node: true,
    regexp: true
*/
'use strict';
var templateRender;
templateRender = function (template, dict, options) {
/*
 * this function will render the template with the given dict
 */
    var argList, getValue, match, renderPartial, rgx, tryCatch, skip, value;
    dict = dict || {};
    options = options || {};
    getValue = function (key) {
        argList = key.split(' ');
        value = dict;
        if (argList[0] === '#this/') {
            return;
        }
        // iteratively lookup nested values in the dict
        argList[0].split('.').forEach(function (key) {
            value = value && value[key];
        });
        return value;
    };
    renderPartial = function (match0, helper, key, partial) {
        switch (helper) {
        case 'each':
        case 'eachTrimRightComma':
            value = getValue(key);
            value = Array.isArray(value)
                ? value.map(function (dict) {
                    // recurse with partial
                    return templateRender(partial, dict, options);
                }).join('')
                : '';
            // remove trailing-comma from last element
            if (helper === 'eachTrimRightComma') {
                value = value.trimRight().replace((/,$/), '');
            }
            return value;
        case 'if':
            partial = partial.split('{{#unless ' + key + '}}');
            partial = getValue(key)
                ? partial[0]
                // handle 'unless' case
                : partial.slice(1).join('{{#unless ' + key + '}}');
            // recurse with partial
            return templateRender(partial, dict, options);
        case 'unless':
            return getValue(key)
                ? ''
                // recurse with partial
                : templateRender(partial, dict, options);
        default:
            // recurse with partial
            return match0[0] + templateRender(match0.slice(1), dict, options);
        }
    };
    tryCatch = function (fnc, message) {
    /*
     * this function will prepend the message to errorCaught
     */
        try {
            return fnc();
        } catch (errorCaught) {
            errorCaught.message = message + errorCaught.message;
            throw errorCaught;
        }
    };
    // render partials
    rgx = (/\{\{#(\w+) ([^}]+?)\}\}/g);
    template = template || '';
    for (match = rgx.exec(template); match; match = rgx.exec(template)) {
        rgx.lastIndex += 1 - match[0].length;
        template = template.replace(
            new RegExp('\\{\\{#(' + match[1] + ') (' + match[2] +
                ')\\}\\}([\\S\\s]*?)\\{\\{/' + match[1] + ' ' + match[2] +
                '\\}\\}'),
            renderPartial
        );
    }
    // search for keys in the template
    return template.replace((/\{\{[^}]+?\}\}/g), function (match0) {
        var notHtmlSafe;
        notHtmlSafe = options.notHtmlSafe;
        return tryCatch(function () {
            getValue(match0.slice(2, -2));
            if (value === undefined) {
                return match0;
            }
            argList.slice(1).forEach(function (arg0, ii, list) {
                switch (arg0) {
                case 'alphanumeric':
                    value = value.replace((/\W/g), '_');
                    break;
                case 'decodeURIComponent':
                    value = decodeURIComponent(value);
                    break;
                case 'encodeURIComponent':
                    value = encodeURIComponent(value);
                    break;
                case 'jsonStringify':
                    value = JSON.stringify(value);
                    break;
                case 'jsonStringify4':
                    value = JSON.stringify(value, null, 4);
                    break;
                case 'notHtmlSafe':
                    notHtmlSafe = true;
                    break;
                case 'truncate':
                    skip = ii + 1;
                    if (value.length > list[skip]) {
                        value = value.slice(0, list[skip] - 3).trimRight() + '...';
                    }
                    break;
                // default to String.prototype[arg0]()
                default:
                    if (ii === skip) {
                        break;
                    }
                    value = value[arg0]();
                    break;
                }
            });
            value = String(value);
            // default to htmlSafe
            if (!notHtmlSafe) {
                value = value
                    .replace((/"/g), '"')
                    .replace((/&/g), '&')
                    .replace((/'/g), ''')
                    .replace((/</g), '<')
                    .replace((/>/g), '>')
                    .replace((/&(amp;|apos;|gt;|lt;|quot;)/ig), '&$1');
            }
            return value;
        }, 'templateRender could not render expression ' + JSON.stringify(match0) + '\n');
    });
};

console.log(templateRender(process.argv[2], JSON.parse(process.argv[3])));

kai zhu kaizhu256 at gmail.com

# Mathias Bynens (6 years ago)

This particular escapeHtml implementation is limited to replacing single characters, but if you wanted to escape any characters that can be represented using a named character reference, you’re gonna need something more generic, as some named character references expand to multiple characters. That‘s what I was referring to earlier.

# kai zhu (6 years ago)

i see... here's some simple, throwaway glue-code that does what i think you want.

/*jslint
node: true
*/
'use strict';
var text;
text = '<script>evilFunction("🍐🍌🍐🍌")</script>';
[
    ['<', '[lt]'],
    ['<', '[lt]'],
    ['>', '[gt]'],
    ['&gt', '[gt]'],
    ['[lt]script', '[lt]noscript'],
    ['[lt]/script', '[lt]/noscript'],
    ['🍐🍌', '🍐🍑']
].forEach(function (element) {
    text = text.replace(
        // https://stackoverflow.com/questions/3561493/is-there-a-regexp-escape-function-in-javascript
        new RegExp(element[0].replace(/[\-\/\\\^$*+?.()|\[\]{}]/g, '\\$&'), 'gu'),
        element[1]
    );
});
// output: [lt]noscript[gt]evilFunction("🍐🍑🍐🍑")[lt]/noscript[gt]
console.log(text);

kai zhu kaizhu256 at gmail.com

# Isiah Meadows (6 years ago)

Next challenge: how does it compare to these two?

// Simplified version
function simpleEscape(text) {
  return text.replace(/<(?:\/?script)?|<|>|>|🍐🍌/gu, m => {
    switch (m) {
    case '<': return '[lt]',
    case '<': return '[lt]',
    case '>': return '[gt]',
    case '>': return '[gt]',
    case '<script': return '[lt]noscript',
    case '</script': return '[lt]/noscript',
    default: return '🍐🍑'
    }
  });
}

// Direct proposal equivalent
var replacements = {
  '<': '[lt]',
  '<': '[lt]',
  '>': '[gt]',
  '>': '[gt]',
  '<script': '[lt]noscript',
  '</script': '[lt]/noscript',
  '🍐🍌': '🍐🍑'
}
function objectEscape(text) {
  return text.replace(/<(?:\/?script)?|&[lg]t;|>|🍐🍌/gu, m => replacements[m]);
}

Oh, and with my proposal, your glue code could be simplified to this:

var text = '<script>evilFunction("🍐🍌🍐🍌")</script>'

text = text.replace({
  '<': '[lt]',
  '<': '[lt]',
  '>': '[gt]',
  '&gt': '[gt]',
  '<script': '[lt]noscript',
  '</script': '[lt]/noscript',
  '🍐🍌': '🍐🍑'
});
// output: [lt]noscript[gt]evilFunction("🍐🍑🍐🍑")[lt]/noscript[gt]

And BTW, my two main justifications are that 1. I don't want to have to escape simple stuff like this, and 2. I'd like the engine to lower this into a fast, simple replace loop without having to compile a regular expression. (Also, my proposal here is the simplest among them.)


Isiah Meadows me at isiahmeadows.com, www.isiahmeadows.com

# Jordan Harband (6 years ago)

Something that escapes HTML wouldn't belong in the language, it would belong in browsers (the HTML spec, probably). This list is for language-level proposals, so I don't think this is the right list to suggest it.

Are there use cases for things in this thread that aren't browser-specific?

# Isiah Meadows (6 years ago)

I was using HTML primitive escaping as a concrete example, but there's others. Most use cases in my experience are essentially escaping for various reasons, but it's also useful for simple extensible templating where you control the expansion, but not the source. Here's a concrete example of what this could do:

// Old
export function format(message, args, prettify = inspect) {
    return message.replace(/\{(.+?)\}/g, (m, prop) =>
        hasOwn.call(args, prop) ? prettify(args[prop], {depth: 5}) : m
    )
}

// New
export function format(message, args, prettify = inspect) {
    return message.replace(Object.keys(args).reduce((acc, k) => ({
      ...acc, [`{${k}}`]: prettify(args[k], {depth: 5})
    }), {})
}

(I presume you're aware that this does have some language precedent in Ruby's String#gsub(regex, hash).)


Isiah Meadows me at isiahmeadows.com, www.isiahmeadows.com

# Bob Myers (6 years ago)

I'm not a huge fan of this idea, but just as a reference point, here is a routine to convert a string to using smart quotes:

// Change straight quotes to curly and double hyphens to em-dashes etc.
export function smarten(a: string) {
  if (!a) return a;

  a = a.replace(/(^|[-\u2014\s(\["])'/g, "$1\u2018"); // opening singles
  a = a.replace(/'/g, "\u2019"); // closing singles & apostrophes
  a = a.replace(/(^|[-\u2014/\[(\u2018\s])"/g, "$1\u201c"); // opening
doubles
  a = a.replace(/"/g, "\u201d"); // closing doubles
  a = a.replace(/\s*--\s*/g, "\u2014"); // em-dashes
  a = a.replace(/\.\.\./g, "\u2026"); // ellipsis
  a = a.replace(/ - /g, "\u2013"); // en-dashes
  a = a.replace(/\s+\?/g, "?"); // Remove Indian-style spaces before
question mark
  a = a.replace(/\s+\/\s+/, "/"); // Spaces around slashes.

  return a;
}
# kai zhu (6 years ago)

not a huge fan either, but here are 2 real-world non-frontend examples:

the first is an excerpt from a backend-tool to automagically convert documentation from developers.github.com into swagger-docs [1]. its used to auto-generate nodejs-clients [2] for github’s ~300 apis, rather than doing it manually. tc39 may-or-may-not have similar use-case in its ongoing effort to make their online-specs machine-readable.

...
htmlToDescription = function (options) {
/*
 * this function will format options.html to swagger markdown-description
 */
    return options.html
        // format \n
        .replace((/\n\n+/g), '\n')
        .replace((/\n<li>/g), '\n\n<li>')
        .replace((/\n<(?:p|pre)>/g), '\n\n')
        .replace((/\n([^\n])/g), ' $1')
        .replace((/\n /g), '\n')
        // format header accept
        .replace((/( header:)<\/p>\n(<code>)/g), '$1 $2')
        // format <a>
        .replace((/<a href="\//g), '<a href="https://developer.github.com/')
        .replace((/<a href="#/g), '<a href="' + options.url + '#')
        .replace((/<a href="(.*?)".*?>(.*?)<\/a>/g), '[$2]($1)')
        // format <xxx>
        .replace((/<code>(.*?)<\/code>/g), '```$1```')
        .replace((/<li>(.*?)<\/li>/g), '  - $1')
        .replace((/<strong>(.*?)<\/strong>/g), '**$1**')
        .replace((/<[^<>]*?>/g), '')
        // format whitespace
        .replace((/ {1,}/g), ' ')
        .split('\n')
        .map(function (element) {
            return element.trim();
        })
        .filter(local.echo)
        .map(function (element) {
            return element + '\n';
        });
};

this second example [3] cannot be solved by the discussed proposal in its current form. it uses a list of regexp's to merge sections of one document into another document. i was basically too lazy to create/maintain a complicated mustached-based template for README.md, so instead used an existing, real-world README.md as a “template”, with a list of regexp-rules to customize it for any given npm-package.

/*
 * this excerpt from
 * https://github.com/kaizhu256/node-utility2/blob/2018.1.13/lib.utility2.js#L3326
 * will inplace-merge pieces of document dataFrom -> document dataTo
 *
 *
 *  document dataFrom  +  document dataTo   ->    inplace-merged document dataTo
 *
 *  # name1               # name2                 # name1
 *  <description1>        <description1>          <description1> 
 *
 *  # live web demo       #live web demo          # live web demo
 *  - none                - [foo.com](foo.com)    - none
 *
 *  ...                   ...                     ...
 */

// search-and-replace - customize dataTo
[
    // customize name and description
    (/.*?\n.*?\n/),
    // customize cdn-download
    (/\n# cdn download\n[\S\s]*?\n\n\n\n/),
    // customize live web demo
    (/\n# live web demo\n[\S\s]*?\n\n\n\n/),
    // customize todo
    (/\n#### todo\n[\S\s]*?\n\n\n\n/),
    // customize quickstart-example-js
    new RegExp('\\n {8}local\\.global\\.local = local;\\n' +
        '[^`]*?\\n {4}\\/\\/ run browser js\\-env code - init-test\\n'),
    new RegExp('\\n {8}local\\.testRunBrowser = function \\(event\\) \\{\\n' +
        '[^`]*?^ {12}if \\(!event \\|\\| \\(event &&\\n', 'm'),
    (/\n {12}\/\/ custom-case\n[^`]*?\n {12}\}\n/),
    // customize quickstart-html-style
    (/\n<\/style>\\n\\\n<style>\\n\\\n[^`]*?\\n\\\n<\/style>\\n\\\n/),
    // customize quickstart-html-body
    (/\nutility2-comment -->(?:\\n\\\n){4}[^`]*?^<!-- utility2-comment\\n\\\n/m),
    // customize build script
    (/\n# internal build script\n[\S\s]*?^- build_ci\.sh\n/m),
    (/\nshBuildCiAfter\(\) \{\(set -e\n[^`]*?\n\)\}\n/),
    (/\nshBuildCiBefore\(\) \{\(set -e\n[^`]*?\n\)\}\n/)
].forEach(function (rgx) {
    // handle large string-replace
    options.dataFrom.replace(rgx, function (match0) {
        options.dataTo.replace(rgx, function (match1) {
            options.dataTo = options.dataTo.split(match1);
            options.dataTo[0] += match0;
            options.dataTo[0] += options.dataTo.splice(1, 1)[0];
            options.dataTo = options.dataTo.join(match1);
        });
    });
});

[1] chained-replace excerpt from tool to transform documentation from developers.github.com, kaizhu256/node-swgg-github-all/blob/2018.2.2/test.js#L86, kaizhu256/node-swgg-github-all/blob/2018.2.2/test.js#L86 [2] list of auto-generated nodejs-clients for github’s apis www.npmjs.com/package/swgg-github-activity, www.npmjs.com/package/swgg-github-activity

www.npmjs.com/package/swgg-github-all, www.npmjs.com/package/swgg-github-all

www.npmjs.com/package/swgg-github-apps, www.npmjs.com/package/swgg-github-apps

www.npmjs.com/package/swgg-github-gists, www.npmjs.com/package/swgg-github-gists

www.npmjs.com/package/swgg-github-git, www.npmjs.com/package/swgg-github-git

www.npmjs.com/package/swgg-github-issues, www.npmjs.com/package/swgg-github-issues

www.npmjs.com/package/swgg-github-migration, www.npmjs.com/package/swgg-github-migration

www.npmjs.com/package/swgg-github-misc, www.npmjs.com/package/swgg-github-misc

www.npmjs.com/package/swgg-github-projects, www.npmjs.com/package/swgg-github-projects

www.npmjs.com/package/swgg-github-pulls, www.npmjs.com/package/swgg-github-pulls

www.npmjs.com/package/swgg-github-reactions, www.npmjs.com/package/swgg-github-reactions

www.npmjs.com/package/swgg-github-scim, www.npmjs.com/package/swgg-github-scim

www.npmjs.com/package/swgg-github-search, www.npmjs.com/package/swgg-github-search

www.npmjs.com/package/swgg-github-teams, www.npmjs.com/package/swgg-github-teams

www.npmjs.com/package/swgg-github-users, www.npmjs.com/package/swgg-github-users [3] inplace-merge documents using list of regexp's kaizhu256/node-utility2/blob/2018.1.13/lib.utility2.js#L3326, kaizhu256/node-utility2/blob/2018.1.13/lib.utility2.js#L3326

kai zhu kaizhu256 at gmail.com