Should String.prototype.split accept variable arguments optionally?

# Benjamin (Inglor) Gruenbaum (11 years ago)

Splitting by one value or another seems to be a pretty common use case if Stack Overflow questions and personal experience are an indication. For example "-" and " " and "/".

Currently, the solution is to pass a regular expression to String.prototype.split .

However, it would be nice to be able to pass variable arguments to the method.

So if I want to split strings like "0-12-12" and "321+32/14" or "TI-CK ER" instead of having to do:

    myString.split(/ |-|\/|\+/g); // this is no fun to read

    myString.split(" ","-","/","+"); // this is easier
    myString.split([" ","-","/","+"]); // this is also easier.

The advantage of the second one (accepting an Array) is that it does not require additional handling of the second "limit" parameter.

A possible advantage of the first one is that it's simpler.

The algorithm for the second one could be quite simple, in 21.1.3.17 only SplitMatch needs to be updated, and implementation sounds pretty simple in engines and the semantics simple in the spec.

Some other languages:

Advantages:

  • Nice api addition that solves a common use case.
  • A way to split by multiple delimiters easily without knowledge of regular expressions.
  • Does not break the existing API, especially if we use array syntax.

Disadvantages

  • Extra overhead
  • Use case needs assertion

(Reminds me of stuff in string_extras harmony:string_extras )

What do you think?

# Rick Waldron (11 years ago)

On Wed, Oct 16, 2013 at 8:54 AM, Benjamin (Inglor) Gruenbaum < inglor at gmail.com> wrote:

Splitting by one value or another seems to be a pretty common use case if Stack Overflow questions and personal experience are an indication. For example "-" and " " and "/".

Currently, the solution is to pass a regular expression to String.prototype.split .

However, it would be nice to be able to pass variable arguments to the method.

So if I want to split strings like "0-12-12" and "321+32/14" or "TI-CK ER" instead of having to do:

    myString.split(/ |-|\/|\+/g); // this is no fun to read

This is subjective, as I have no trouble reading and understanding what this means and is expected to do (also subjective).

myString.split(" ","-","/","+"); // this is easier
myString.split([" ","-","/","+"]); // this is also easier.

The advantage of the second one (accepting an Array) is that it does not
require additional handling of the second "limit" parameter.

A possible advantage of the first one is that it's simpler.

The algorithm for the second one could be quite simple, in 21.1.3.17 only
_SplitMatch_ needs to be updated, and implementation sounds pretty simple
in engines and the semantics simple in the spec.

Some other languages:

 - C# accepts an array of `char` or array of "string" in its Split,
http://msdn.microsoft.com/en-us/library/tabh47cf.aspx
 - Ruby doesn't do this with `.split`, behaves like JS
http://ruby-doc.org/core-2.0.0/String.html#method-i-split
 - Java String.split only accepts a regex
 - Python doesn't do this with `.split`

Advantages:
 - Nice api addition that solves a common use case.
 - A way to split by multiple delimiters easily without knowledge of
regular expressions.
 - Does not break the existing API, especially if we use array syntax.

Disadvantages
 - Extra overhead
 - Use case needs assertion

(Reminds me of stuff in string_extras
http://wiki.ecmascript.org/doku.php?id=harmony:string_extras )



 What do you think?

Since I've never actually found this to be a hardship, I'd be interested in reading through the SO postings you mentioned above.

# Benjamin (Inglor) Gruenbaum (11 years ago)

On Wed, Oct 16, 2013 at 7:15 PM, Rick Waldron <waldron.rick at gmail.com> wrote:

This is subjective, as I have no trouble reading and understanding what

this means and is expected to do (also subjective).

Of course, this is another way to do it that does not require knowing regular expressions. I have no doubt that being easier/harder is subjective here.

Since I've never actually found this to be a hardship, I'd be interested

in reading through the SO postings you mentioned above.

I don't keep a list :) Here is a bunch I just found:

stackoverflow.com/questions/13867182/how-split-a-string-in-jquery-with-multiple-strings-as-separator?lq=1

stackoverflow.com/questions/19313541/split-a-string-based-on-multiple-delimiters?lq=1

stackoverflow.com/questions/9535203/split-string-by-whitespace-and-dashes

This is a very shallow search, lots of the ones I run into are closed as duplicates refering to the one with the many upvotes above. Personally I have no problem with regular expressions but many find them confusing in such cases (I admit, a lot of those are language newbies I run into in JS chats like that on Stack Overflow that just want JS to do something - but they're a user base too).

# Andrea Giammarchi (11 years ago)

I stopped here

On Wed, Oct 16, 2013 at 5:54 AM, Benjamin (Inglor) Gruenbaum < inglor at gmail.com> wrote:

    myString.split(/ |-|\/|\+/g); // this is no fun to read

    myString.split(" ","-","/","+"); // this is easier
    myString.split([" ","-","/","+"]); // this is also easier.

easier for who ?

All I was doing is being sure no double quoting was missing or not escaped correctly ... also you don't need the g flag and RegExp is much more powerful than multiple strings

myString.split(/before(and|or)?after/);

instead of 3 partially repeated nonsense ?

My 2 cents

# Andrea Giammarchi (11 years ago)

also, this is the same:

myString.split(/[ -/+]/)

maybe it's better to explain those users that knowing RegExp might bring benefits for their purpose (coding) ?

# Benjamin (Inglor) Gruenbaum (11 years ago)

On Wed, Oct 16, 2013 at 8:18 PM, Andrea Giammarchi < andrea.giammarchi at gmail.com> wrote:

also, this is the same: myString.split(/[ -/+]/)

Yes, that's true, not sure how that works for multi character delimiters.

maybe it's better to explain those users that knowing RegExp might bring

benefits for their purpose (coding) ?

This is not the first or last thing in a language that is intended to make life easier for users who are not highly trained professionals. I agree that the added value is not huge here, but implementation is pretty inexpensive and I can see people use this.

# Andrea Giammarchi (11 years ago)

How could change 12+ years of legacy be considered inexpensive ?

Non "highly trained professionals" should be simple things or try to learn something new that won't hurt, that's why Stack Overflow exists in first place, to ask for help or explanations about things.

This request sounds to me like "I don't fully understand how Math.expm1() works so please make it simpler" ... legitimate to ask but I would not expect a language core change because of my lack of knowledge.

With great powers come great responsibilities ... you wanna code? So please do it right ;-)

(not directed to you and I am sure you got what I mean)

Obviously this is just my personal opinion.

# Andrea Giammarchi (11 years ago)

typo: Non "highly trained professionals" should do simple things

# Brendan Eich (11 years ago)

It's hard to add extra optional arguments to a long-standing built-in. People write code that passes an extra arg that has been ignored till the change; browsers that try shipping the new version then break that content, user blames browser (rightly so) but also the page, sometimes (not right).

Regexps suck away oxygen too, as others note. My suggestion is to focus fire for greater effect. If we need a new variable-split, we want a new API.

# Benjamin (Inglor) Gruenbaum (11 years ago)

On Wed, Oct 16, 2013 at 9:03 PM, Andrea Giammarchi < andrea.giammarchi at gmail.com> wrote:

How could change 12+ years of legacy be considered inexpensive ?

This proposal does not break anything, the only thing that will/might work differently is people passing an array to .split right now and expecting something like splitting on firstElem,secondElem.... Current usages of .split with a string or with a regular expression will continue to function exactly the same, the only thing happening here is adding an overload for an array.

Non "highly trained professionals" should be simple things or try to

learn something new that won't hurt... so please make it simpler"

If we can make life the easier without breaking anything, why not? A lot of people use JavaScript on the web in practice for making websites dynamic, most of those people are not me and certainly not you but they're still probably the majority. How many so called ""jQuery programmers"" are there?

I'm not sure taking a common use case like .split on multiple delimiters and removing a skill (Regular Expressions) they had to know before in order to write code they understand and by that making the language more accessible to them is not a good idea.

There are and have been such API changes in the language. Why have str.contains if you can just do ~str.indexOf. Why have .startsWith, or .indexOf on arrays?

I'm not saying we should add a .split third overload in addition to the already existing two. I do however think it is an interesting idea that in practice a lot of developers, especially ones who do not remember the abstract comparison algorithm by heart can possibly benefit from.

Are you against changes like .contains or .startsWith too? What do you think is the criterea for such additions to the language?

On Wed, Oct 16, 2013 at 9:21 PM, Brendan Eich <brendan at mozilla.com> wrote:

It's hard to add extra optional arguments to a long-standing built-in.

People write code that passes an extra arg that has been ignored till the change; browsers that try shipping the new version then break that content, user blames browser (rightly so) but also the page, sometimes (not right).

Agreed. Even without compatibility issues, I get confused questions from students regarding the behavior of methods changing behavior based on number of arguments (Array for example) way more often than I'd expect. (and props on the spec, being able to send students there in such cases is extremely enabling imo and a great exercise on its own).

What about the version with the overload accepting an Array instead? It seems more backwards compatible than the varargs version.

Regexps suck away oxygen too, as others note. My suggestion is to focus

fire for greater effect. If we need a new variable-split, we want a new API.

I actually like the existing API for the most part and just wish it made life easier at times. Especially for learners. I'd like to teach my nephew JavaScript as a first language and I don't want to go anywhere near regular expressions.

# Andrea Giammarchi (11 years ago)

It does break in many ways as also Brendan said.

''.split(what, ever)

would break

plus

"some1,2thing".split([1,2])

you have no idea what you could find in legacy code ... so whatever a 12+ years old legacy does shold keep doing like that and if you really want to put your array in there follow Brendan hint and create your own method.

Once again, just my 2 cents

# Andrea Giammarchi (11 years ago)

also, to reply your question "Are you against changes like .contains or .startsWith too?"

Absolutely not, and these are new methods indeed. Perfectly in line with what you wrote:

Why have str.contains if you can just do ~str.indexOf. Why have .startsWith, or .indexOf on arrays?

You are asking to change a method that is like that since ever. Using your examples is like asking to return false on indexOf because new comers don't understand that -1 is truthy value, you know what I mean?

String#split does what it does and it should keep doing that, even if I've never used an Array or multiple arguments on it (also never felt the need to)

Hope I've answered your questions.

As Summary: Every jQuery programmer could write a function that accept an array and does the split with that, no need to ask to change core legacy (my only point in here, really).

Feel free to propose a new method ;-)

# Rick Waldron (11 years ago)

On Wed, Oct 16, 2013 at 4:25 PM, Benjamin (Inglor) Gruenbaum < inglor at gmail.com> wrote:

On Wed, Oct 16, 2013 at 9:03 PM, Andrea Giammarchi < andrea.giammarchi at gmail.com> wrote:

How could change 12+ years of legacy be considered inexpensive ?

This proposal does not break anything, the only thing that will/might work differently is people passing an array to .split right now and expecting something like splitting on firstElem,secondElem.... Current usages of .split with a string or with a regular expression will continue to function exactly the same, the only thing happening here is adding an overload for an array.

Non "highly trained professionals" should be simple things or try to learn something new that won't hurt... so please make it simpler"

If we can make life the easier without breaking anything, why not? A lot of people use JavaScript on the web in practice for making websites dynamic, most of those people are not me and certainly not you but they're still probably the majority. How many so called ""jQuery programmers"" are there?

As one of jQuery's own representatives to Ecma/TC39, I still stand by my original response. After reading through those links, I agree with Andrea: it's a handful of people that just need time and motivation to learn.

I'm not sure taking a common use case like .split on multiple delimiters and removing a skill (Regular Expressions) they had to know before in order to write code they understand and by that making the language more accessible to them is not a good idea.

There are and have been such API changes in the language. Why have str.contains if you can just do ~str.indexOf. Why have .startsWith, or .indexOf on arrays?

I'm not saying we should add a .split third overload in addition to the already existing two. I do however think it is an interesting idea that in practice a lot of developers, especially ones who do not remember the abstract comparison algorithm by heart can possibly benefit from.

Are you against changes like .contains or .startsWith too? What do you think is the criterea for such additions to the language?

I'm all for new additions, but nothing is free and changing an existing thing is potentially even more expensive. Brendan's response is the most compelling.

On Wed, Oct 16, 2013 at 9:21 PM, Brendan Eich <brendan at mozilla.com> wrote:

It's hard to add extra optional arguments to a long-standing built-in. People write code that passes an extra arg that has been ignored till the change; browsers that try shipping the new version then break that content, user blames browser (rightly so) but also the page, sometimes (not right).

Agreed. Even without compatibility issues, I get confused questions from students regarding the behavior of methods changing behavior based on number of arguments (Array for example) way more often than I'd expect. (and props on the spec, being able to send students there in such cases is extremely enabling imo and a great exercise on its own).

Too late to change Array arg handling. In the future, tell students to construct new Arrays of numbers (that may have 1 entry) with Array.of():

Array.of(42).length === 1;

(vs. Array(42).length === 42; )

What about the version with the overload accepting an Array instead? It seems more backwards compatible than the varargs version.

What version is that?

Regexps suck away oxygen too, as others note. My suggestion is to focus

fire for greater effect. If we need a new variable-split, we want a new API.

I actually like the existing API for the most part and just wish it made life easier at times. Especially for learners. I'd like to teach my nephew JavaScript as a first language and I don't want to go anywhere near regular expressions.

Again, this is too subjective.

# Benjamin (Inglor) Gruenbaum (11 years ago)

On Wed, Oct 16, 2013 at 11:46 PM, Rick Waldron <waldron.rick at gmail.com> wrote:

What about the version with the overload accepting an Array instead? It

seems more backwards compatible than the varargs version.

I meant the version accepting an array .split([a,b,c]) rather than .split(a,b,c) but the 12 years and the two lightly arguments by Andrea and Brendan convinced me that adding something like this to .split() is too risky.

in the future, tell students to construct new Arrays of numbers (that may

have 1 entry) with Array.of():

Nice! I've almost never had to do Array(..items), usually I just suggest a literal in the cases they do :) Thanks for the tip though, I should really read the ES6 current spec from start to end.

# Angus Croll (11 years ago)

with the separate arguments solution the 'limit' argument is unusable with the array solution you have a punctuation nightmare

the required regex seems easier in comparison