RegExp.prototype.count

# kai zhu (7 years ago)

a common use-case i have is counting newlines in largish (> 200kb) embedded-js files, like this real-world example [1]. ultimately meant for line-number-preservation purposes in auto-lint/auto-prettify tasks (which have been getting slower due to complexity).

would a new RegExp count-method like (/\n/g).count(largeCode) be significantly more efficient than existing largeCode.split("\n").length - 1 or largeCode.replace((/[^\n]+/g), "").length?

-kai

[1] calculating and reproducing line-number offsets when linting/autofixing files kaizhu256/node-utility2/blob/2018.12.30/lib.jslint.js#L7377, kaizhu256/node-utility2/blob/2018.12.30/lib.jslint.js#L7377

kaizhu256/node-utility2/blob/2018.12.30/lib.jslint.js#L7586, kaizhu256/node-utility2/blob/2018.12.30/lib.jslint.js#L7586

a common use-case i have is counting newlines in largish (> 200kb) embedded-js files, like this real-world example [1].  ultimately meant for line-number-preservation purposes in auto-lint/auto-prettify tasks (which have been getting slower due to complexity).

would a new RegExp count-method like ```(/\n/g).count(largeCode)``` be significantly more efficient than existing ```largeCode.split("\n").length - 1``` or ```largeCode.replace((/[^\n]+/g), "").length```?

-kai

[1] calculating and reproducing line-number offsets when linting/autofixing files
https://github.com/kaizhu256/node-utility2/blob/2018.12.30/lib.jslint.js#L7377 <https://github.com/kaizhu256/node-utility2/blob/2018.12.30/lib.jslint.js#L7377>
https://github.com/kaizhu256/node-utility2/blob/2018.12.30/lib.jslint.js#L7586 <https://github.com/kaizhu256/node-utility2/blob/2018.12.30/lib.jslint.js#L7586>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20190112/dd294e8c/attachment.html>

# Isiah Meadows (7 years ago)

If performance is an issue, regular expressions are likely to be too slow to begin with. But you could always do this to count the number of lines in a particular string:

var count = 0
var re = /\n|\r\n?/g
while (re.test(str)) count++
console.log(count)

Given it's already this easy to iterate something with a regexp, I'm not convinced it's necessary to add this property/method.

If performance is an issue, regular expressions are likely to be too slow
to begin with. But you could always do this to count the number of lines in
a particular string:

```js
var count = 0
var re = /\n|\r\n?/g
while (re.test(str)) count++
console.log(count)
```

Given it's already this easy to iterate something with a regexp, I'm not
convinced it's necessary to add this property/method.
On Sat, Jan 12, 2019 at 17:29 kai zhu <kaizhu256 at gmail.com> wrote:

> a common use-case i have is counting newlines in largish (> 200kb)
> embedded-js files, like this real-world example [1].  ultimately meant for
> line-number-preservation purposes in auto-lint/auto-prettify tasks (which
> have been getting slower due to complexity).
>
> would a new RegExp count-method like ```(/\n/g).count(largeCode)``` be
> significantly more efficient than existing ```largeCode.split("\n").length
> - 1``` or ```largeCode.replace((/[^\n]+/g), "").length```?
>
> -kai
>
> [1] calculating and reproducing line-number offsets when
> linting/autofixing files
>
> https://github.com/kaizhu256/node-utility2/blob/2018.12.30/lib.jslint.js#L7377
>
> https://github.com/kaizhu256/node-utility2/blob/2018.12.30/lib.jslint.js#L7586
>
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20190113/c5e00e7d/attachment.html>

# kai zhu (7 years ago)

benchmarked @isiah’s while-loop test-case vs str.split vs str.replace for regexp counting on jsperf.com, jsperf.com [1], and the results were surprising (for me).

benchmarks using 1mb random ascii-string from fastest to slowest.

(fastest - 1,700 runs/sec) regexp-counting with largeCode.split(/\n/).length - 1
(40% slower - 1000 runs/sec) regexp-counting with while-loop (/n/g)
(60% slower - 700 runs/sec) regexp-counting with largeCode.replace((/[^\n]+/g), "").length

looks like the go-to design-pattern for counting-regexp is str.split(<regexp>).length - 1

[1] regexp counting 2 jsperf.com/regexp

benchmarked @isiah’s while-loop test-case vs str.split vs str.replace for regexp counting on jsperf.com <http://jsperf.com/> [1], and the results were surprising (for me).  

benchmarks using 1mb random ascii-string from fastest to slowest.
1. (fastest - 1,700 runs/sec) regexp-counting with ```largeCode.split(/\n/).length - 1```
2. (40% slower - 1000 runs/sec) regexp-counting with ```while-loop (/n/g)```
3. (60% slower - 700 runs/sec) regexp-counting with ```largeCode.replace((/[^\n]+/g), "").length```

looks like the go-to design-pattern for counting-regexp is ```str.split(<regexp>).length - 1```

[1] regexp counting 2
https://jsperf.com/regexp-counting-2

> On 13 Jan 2019, at 9:15 PM, Isiah Meadows <isiahmeadows at gmail.com> wrote:
> 
> If performance is an issue, regular expressions are likely to be too slow to begin with. But you could always do this to count the number of lines in a particular string:
> 
> ```js
> var count = 0
> var re = /\n|\r\n?/g
> while (re.test(str)) count++
> console.log(count)
> ```
> 
> Given it's already this easy to iterate something with a regexp, I'm not convinced it's necessary to add this property/method.
> On Sat, Jan 12, 2019 at 17:29 kai zhu <kaizhu256 at gmail.com <mailto:kaizhu256 at gmail.com>> wrote:
> a common use-case i have is counting newlines in largish (> 200kb) embedded-js files, like this real-world example [1].  ultimately meant for line-number-preservation purposes in auto-lint/auto-prettify tasks (which have been getting slower due to complexity).
> 
> would a new RegExp count-method like ```(/\n/g).count(largeCode)``` be significantly more efficient than existing ```largeCode.split("\n").length - 1``` or ```largeCode.replace((/[^\n]+/g), "").length```?
> 
> -kai
> 
> [1] calculating and reproducing line-number offsets when linting/autofixing files
> https://github.com/kaizhu256/node-utility2/blob/2018.12.30/lib.jslint.js#L7377 <https://github.com/kaizhu256/node-utility2/blob/2018.12.30/lib.jslint.js#L7377>
> https://github.com/kaizhu256/node-utility2/blob/2018.12.30/lib.jslint.js#L7586 <https://github.com/kaizhu256/node-utility2/blob/2018.12.30/lib.jslint.js#L7586>
> 
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org <mailto:es-discuss at mozilla.org>
> https://mail.mozilla.org/listinfo/es-discuss <https://mail.mozilla.org/listinfo/es-discuss>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20190119/c0e4295f/attachment.html>

# Isiah Meadows (7 years ago)

Nit: you should use .spilt(/\n/g) to get all parts.

I like the benchmarks here. That's much appreciated, and after further investigation, I found a giant WTF: jsperf.com/regexp-counting-2/8

TL;DR: for string character counting, prefer indexOf.

For similar reasons to that JSPerf thing, I'd like it to be on the String prototype rather than the RegExp prototype, as in str.count(/\n/).

Isiah Meadows contact at isiahmeadows.com, www.isiahmeadows.com

Nit: you should use `.spilt(/\n/g)` to get all parts.

I like the benchmarks here. That's much appreciated, and after further
investigation, I found a *giant* WTF:
https://jsperf.com/regexp-counting-2/8

TL;DR: for string character counting, prefer `indexOf`.

For similar reasons to that JSPerf thing, I'd like it to be on the
String prototype rather than the RegExp prototype, as in
`str.count(/\n/)`.

-----

Isiah Meadows
contact at isiahmeadows.com
www.isiahmeadows.com

On Sun, Jan 20, 2019 at 12:33 AM kai zhu <kaizhu256 at gmail.com> wrote:
>
> benchmarked @isiah’s while-loop test-case vs str.split vs str.replace for regexp counting on jsperf.com [1], and the results were surprising (for me).
>
> benchmarks using 1mb random ascii-string from fastest to slowest.
> 1. (fastest - 1,700 runs/sec) regexp-counting with ```largeCode.split(/\n/).length - 1```
> 2. (40% slower - 1000 runs/sec) regexp-counting with ```while-loop (/n/g)```
> 3. (60% slower - 700 runs/sec) regexp-counting with ```largeCode.replace((/[^\n]+/g), "").length```
>
> looks like the go-to design-pattern for counting-regexp is ```str.split(<regexp>).length - 1```
>
> [1] regexp counting 2
> https://jsperf.com/regexp-counting-2
>
> On 13 Jan 2019, at 9:15 PM, Isiah Meadows <isiahmeadows at gmail.com> wrote:
>
> If performance is an issue, regular expressions are likely to be too slow to begin with. But you could always do this to count the number of lines in a particular string:
>
> ```js
> var count = 0
> var re = /\n|\r\n?/g
> while (re.test(str)) count++
> console.log(count)
> ```
>
> Given it's already this easy to iterate something with a regexp, I'm not convinced it's necessary to add this property/method.
> On Sat, Jan 12, 2019 at 17:29 kai zhu <kaizhu256 at gmail.com> wrote:
>>
>> a common use-case i have is counting newlines in largish (> 200kb) embedded-js files, like this real-world example [1].  ultimately meant for line-number-preservation purposes in auto-lint/auto-prettify tasks (which have been getting slower due to complexity).
>>
>> would a new RegExp count-method like ```(/\n/g).count(largeCode)``` be significantly more efficient than existing ```largeCode.split("\n").length - 1``` or ```largeCode.replace((/[^\n]+/g), "").length```?
>>
>> -kai
>>
>> [1] calculating and reproducing line-number offsets when linting/autofixing files
>> https://github.com/kaizhu256/node-utility2/blob/2018.12.30/lib.jslint.js#L7377
>> https://github.com/kaizhu256/node-utility2/blob/2018.12.30/lib.jslint.js#L7586
>>
>> _______________________________________________
>> es-discuss mailing list
>> es-discuss at mozilla.org
>> https://mail.mozilla.org/listinfo/es-discuss
>
>

# kai zhu (7 years ago)

+1 for string.count

i don’t think the g-flag is necessary in str.split, so the original performance claims are still valid:

for counting regexp - use split + length
for counting substring - use while + indexOf

+1 for string.count

i don’t think the g-flag is necessary in str.split, so the original performance claims are still valid:
- for counting regexp - use split + length
- for counting substring - use while + indexOf

> On 20 Jan 2019, at 12:22 AM, Isiah Meadows <isiahmeadows at gmail.com> wrote:
> 
> Nit: you should use `.spilt(/\n/g)` to get all parts.
> 
> I like the benchmarks here. That's much appreciated, and after further
> investigation, I found a *giant* WTF:
> https://jsperf.com/regexp-counting-2/8
> 
> TL;DR: for string character counting, prefer `indexOf`.
> 
> For similar reasons to that JSPerf thing, I'd like it to be on the
> String prototype rather than the RegExp prototype, as in
> `str.count(/\n/)`.
> 
> -----
> 
> Isiah Meadows
> contact at isiahmeadows.com
> www.isiahmeadows.com
> 
> On Sun, Jan 20, 2019 at 12:33 AM kai zhu <kaizhu256 at gmail.com> wrote:
>> 
>> benchmarked @isiah’s while-loop test-case vs str.split vs str.replace for regexp counting on jsperf.com [1], and the results were surprising (for me).
>> 
>> benchmarks using 1mb random ascii-string from fastest to slowest.
>> 1. (fastest - 1,700 runs/sec) regexp-counting with ```largeCode.split(/\n/).length - 1```
>> 2. (40% slower - 1000 runs/sec) regexp-counting with ```while-loop (/n/g)```
>> 3. (60% slower - 700 runs/sec) regexp-counting with ```largeCode.replace((/[^\n]+/g), "").length```
>> 
>> looks like the go-to design-pattern for counting-regexp is ```str.split(<regexp>).length - 1```
>> 
>> [1] regexp counting 2
>> https://jsperf.com/regexp-counting-2
>> 
>> On 13 Jan 2019, at 9:15 PM, Isiah Meadows <isiahmeadows at gmail.com> wrote:
>> 
>> If performance is an issue, regular expressions are likely to be too slow to begin with. But you could always do this to count the number of lines in a particular string:
>> 
>> ```js
>> var count = 0
>> var re = /\n|\r\n?/g
>> while (re.test(str)) count++
>> console.log(count)
>> ```
>> 
>> Given it's already this easy to iterate something with a regexp, I'm not convinced it's necessary to add this property/method.
>> On Sat, Jan 12, 2019 at 17:29 kai zhu <kaizhu256 at gmail.com> wrote:
>>> 
>>> a common use-case i have is counting newlines in largish (> 200kb) embedded-js files, like this real-world example [1].  ultimately meant for line-number-preservation purposes in auto-lint/auto-prettify tasks (which have been getting slower due to complexity).
>>> 
>>> would a new RegExp count-method like ```(/\n/g).count(largeCode)``` be significantly more efficient than existing ```largeCode.split("\n").length - 1``` or ```largeCode.replace((/[^\n]+/g), "").length```?
>>> 
>>> -kai
>>> 
>>> [1] calculating and reproducing line-number offsets when linting/autofixing files
>>> https://github.com/kaizhu256/node-utility2/blob/2018.12.30/lib.jslint.js#L7377
>>> https://github.com/kaizhu256/node-utility2/blob/2018.12.30/lib.jslint.js#L7586
>>> 
>>> _______________________________________________
>>> es-discuss mailing list
>>> es-discuss at mozilla.org
>>> https://mail.mozilla.org/listinfo/es-discuss
>> 
>>