Triple quoted strings

# jeff.ecmascript at tanasity.com (16 years ago)

If I understand the motivation for triple quoted strings, it's to allow multiline strings with inverted commas and quotes within them, and to allow quotes and inverted commas within those substrings, all without requiring a continuation marker.

PHP has a structure that's used for this sort of thing, which works quite well and is used when you have a large block of text tobe used as a string. As an example of when this is used, commonly, emails that are constructed in code use this technique.

The heredoc syntax is:


$var = <<<EndOfStringMarker This text is part of the string. It can include " and '. It can include references to one or more $variable, each of which is replaced with the value of the variable. This string includes the line breaks without needing backslash syntax like backslash n to introduce a newline. Consequently the previous sentence includes a break after the fourth word. To end this string you must have whatever end of string marker you have chosen on a line by itself. Thusly... EndOfStringMarker;

PHP documentation on the feature is at php.net/manual/en/language.types.string.php#language.types.string.syntax.heredoc

IMO this is a useful but fairly infrequently used language facility. Constructing large blocks of text without this facility would be harder, but losing the facility would not be a major blow.

Jeff Veit

# Igor Bukanov (16 years ago)

On 06/03/2008, jeff.ecmascript at tanasity.com <jeff.ecmascript at tanasity.com> wrote:

If I understand the motivation for triple quoted strings, it's to allow multiline strings with inverted commas and quotes within them, and to allow quotes and inverted commas within those substrings, all without requiring a continuation marker.

PHP has a structure that's used for this sort of thing, which works quite well and is used when you have a large block of text tobe used as a string. As an example of when this is used, commonly, emails that are constructed in code use this technique.

The interesting thing is ECMAScript E4X extension already allows to enter multiline strings like in:

var i = 10; var str = <x>arbitrary

multiline text that can embed expression {i * 2} references</x>.toString();

The triple quota proposal would introduce significantly less powerful mechanism for embedding the strings.

# Peter Hall (16 years ago)

Except that it would have a few unexpected behaviours, especially around the & and < characters.

Peter

# Nathan de Vries (16 years ago)

On Thu, 2008-03-06 at 00:14 +0000, Peter Hall wrote:

Except that it would have a few unexpected behaviours, especially around the & and < characters.

What would those unexpected behaviours be? Here document syntax is available (and relatively standard) in UNIX shell, PHP, Perl & Ruby. Python seems to be the only language that has strayed from the traditional syntax, and I'm not entirely sure why ES4 is planning to follow suite.

Some advantages of traditional here document syntax can be seen below (all Ruby):

Standard here document

foo = <<EOS bar EOS

Here document, with stripped leading whitespace

def foo return <<-EOS bar EOS end

Stacked here documents

foo(<<DOC1, <<DOC2) this string is passed as the first argument DOC1 this string is passed as the second argument DOC2

It seems that triple-quoted string literals only seem to cover the first case.

Cheers,

-- Nathan de Vries

# Brendan Eich (16 years ago)

On Mar 5, 2008, at 11:17 PM, Nathan de Vries wrote:

On Thu, 2008-03-06 at 00:14 +0000, Peter Hall wrote:

Except that it would have a few unexpected behaviours, especially around the & and < characters.

What would those unexpected behaviours be? Here document syntax is available (and relatively standard) in UNIX shell, PHP, Perl & Ruby. Python seems to be the only language that has strayed from the traditional syntax, and I'm not entirely sure why ES4 is planning to follow suite.

I'm an old Unix hacker, I remember the Bourne Shell (source too, in C- gol). E4X may have given people the idea that ES4 will add even more
(mis-)appropriated syntax but we aren't doing pipelines or command
substitution. I don't see why we need here documents in all their
glory, although I like and use them.

Triple-quoted strings are simply for embedding quote and newline
characters, verbatim and freely. If no one (including you
Pythonistas) would find them useful then we should defer the proposal.

# Michael Daumling (16 years ago)

Well, I do find the feature useful...

I have seen a lot of scripts that contain multiline strings. Yes, you can used escaped newlines, but triple-quoting is definitely much more useful. For what it's worth, triple-quoting is syntactic sugar, but the sugar

  1. is easy to describe and implement,
  2. does not add ugly stuff to the language,
  3. makes multiline strings much more readable.

Michael

# Igor Bukanov (16 years ago)

On 06/03/2008, Igor Bukanov <igor at mir2.org> wrote:

On 06/03/2008, Peter Hall <peter.hall at memorphic.com> wrote:

Except that it would have a few unexpected behaviours, especially around the & and < characters.

In normal strings and the triple quoted strings one has to escape \ . So what is exactly the point?

Note that I do not advocate adopting ES4 syntax. My point is that in many cases where multiline strings are desirable from readability point of view E4X already offers a solution mitigating the need for yet another multiline syntax.

As regarding the need to escape & and <, the last time I have used the E4X "multiline strings" was to ebbed JSON literals for testing purposes (which I have found rather funny). Those literals typically do not include & and <.

, Igor

# Steven Mascaro (16 years ago)

On 06/03/2008, Michael Daumling <mdaeumli at adobe.com> wrote:

Well, I do find the feature useful...

I have seen a lot of scripts that contain multiline strings. Yes, you can used escaped newlines, but triple-quoting is definitely much more useful. For what it's worth, triple-quoting is syntactic sugar, but the sugar

  1. is easy to describe and implement,
  2. does not add ugly stuff to the language,
  3. makes multiline strings much more readable.

Is there a reason why single-quote strings can't simply be made multiline in ES4? This would be backwards compatible and wouldn't exclude the use of triple-quoted strings (or heredoc).

# P T Withington (16 years ago)

On 2008-03-06, at 02:24 EST, Brendan Eich wrote:

On Mar 5, 2008, at 11:17 PM, Nathan de Vries wrote:

On Thu, 2008-03-06 at 00:14 +0000, Peter Hall wrote:

Except that it would have a few unexpected behaviours, especially around the & and < characters.

What would those unexpected behaviours be? Here document syntax is available (and relatively standard) in UNIX shell, PHP, Perl & Ruby. Python seems to be the only language that has strayed from the traditional syntax, and I'm not entirely sure why ES4 is planning to follow suite.

I'm an old Unix hacker, I remember the Bourne Shell (source too, in C- gol). E4X may have given people the idea that ES4 will add even more (mis-)appropriated syntax but we aren't doing pipelines or command substitution. I don't see why we need here documents in all their glory, although I like and use them.

Triple-quoted strings are simply for embedding quote and newline characters, verbatim and freely. If no one (including you Pythonistas) would find them useful then we should defer the proposal.

I'm an older Unix hacker (old enough to either remember sh before the
invention of heredocs, or to forget that they were there from the
beginning :P), and Lisp, and Python.

I'd like to see us keep triple-quoted strings. But with a simpler
(tried and true?) syntax, say that of Python 'long strings'. (Which
to my mind are just heredocs without a choice of delimiter.) Seems to
me we just tried to be too clever parsing triple-quoted strings,
leading to all those funny edge cases that started this thread.

# liorean (16 years ago)

On 06/03/2008, P T Withington <ptw at pobox.com> wrote:

I'd like to see us keep triple-quoted strings. But with a simpler (tried and true?) syntax, say that of Python 'long strings'. (Which to my mind are just heredocs without a choice of delimiter.) Seems to me we just tried to be too clever parsing triple-quoted strings, leading to all those funny edge cases that started this thread.

Is the current syntax so bad, then, or complicated? The treatment of the string content itself is basically the same as for any other string with the single exception of newlines being allowed. The external syntax seems like it would match this regex:

/((["'])\2\2)((?:\\[\s\S]|(?!\1(?!\2))[\s\S])*)\2/

(Hmm. Were there any proposal for adding a any-includes-newline flag or a any-including-newlines escape instead of the [\s\S] cludge? If not, is \A)

I don't see it being such a bad idea - it's just syntactic sugar over normal ES3 strings. Just to prove the point, transforming such a string to the equivalent ES3 can be done like so:

var
    reTripleQuotedString =

/((["'])\2\2)((?:\[\s\S]|(?!\1(?!\2))[\s\S])*)\2/, es4TripleQuotedString='"""some\nstring" '''with "" several \""" different quirks\\n\n''' in it """""', match=reTripleQuotedString.exec(es4TripleQuotedString), delim=match[2], reDelim=new RegExp(delim,'g'), contents=match[3], es3String=delim+contents.replace(reDelim,'\'+delim).replace(/\n/g,'\n')+delim;

// es4TripleQuotedString => """some
//string" '''with "" several \""" different quirks\
//
//''' in it """""

// es3String =>"some\nstring\" '''with \"\" many \\"\"\" different

quirks\n\n''' in it """

Sure, we could have wanted a more advanced string building mechanism that was literal all the way, without ES3 escaping. Or we could have wanted a more dynamic string that expanded expressions in it using some syntax. Or we could have wanted a user selectable delimiter.

But for the use case of not having to make multiline strings instead be long strings containing \n where there is conceptually a newline, and for the use case of not having to escape singly quoted delimiters in the string, what we have is good enough. I think we should keep it as-is.

# liorean (16 years ago)

On 06/03/2008, liorean <liorean at gmail.com> wrote:

(Hmm. Were there any proposal for adding a any-includes-newline flag or a any-including-newlines escape instead of the [\s\S] cludge? If not, is \A)

Never finished that sentence it seems. It should have run: If not, is \A or some similar escape sequence for it possible to add at this stage?

# Jonathan Toland (16 years ago)

i've been reading some past posts and wanted to add some supplemental input on this one and emphasize some key advantages to mirroring e4x templating syntax in triple quoted strings. first currently beginning to work in as3 i highly regard the progress the wg has made. es4 really has grown up into a very structured (yet still almost uniquely flexible, extensible, and dynamic) language in comparison to the tediousness of trying to scale up my js apps.

  1. i have often missed string interpolation working in js although i have never missed php's (and others?) multiline strings. i don't mind chalking that up to inexperience and an unhealthy fondness for perl but i do feel string interpolation is arguably second only to regex (which i also love) in perl's historical success which both php and more effectlively ruby (with a syntax like e4x) mimic.

  2. as the poster below alluded to as3 already provides string interpolationde facto via e4x. anywhere a string is expected i can substitute a XML literal like <>Today is {new Date.toDateString()} and the time is {new Date.toTimeString()}.</> ver batim. this does identify a few weaknesses however:

first there is a great deal of overhead (creating the XML including the text node as child XML, translating xml entities, recognizing the type context to call toString(), forwarding toString() to the text node, and finally translating the entities back) making it potentially inefficient for any long running loop. the vm could optimize for this but why have to?

also it seems inconsistent to define the previous content with that facility and have to redefine it normally: //assuming xml was assigned the previous example now reassign its content xml='The time is now '+new Date.toTimeString()+'.'; rather than the following possible employing string interpolation: xml="""The time is now {new Date.toTimeString()}.""";

i won't contend that it's not largely syntactic sugar but the utility of string interpolation in other languages especially server side is hard to argue. in original responses to this post it was also alluded to that exactly three characters must be escaped using e4x this way (<, {, and }, while others are handled by the vm) which i will address next.

  1. using triple quoted strings for string interpolation would not only negate not having to escape new lines but any escape sequence syntax at all (including for backslashes). literal delimiters could simply be included between braces as traditional strings e.g.: """this {'example """is""" NOT\xA9 {utilitarian} per'} se""" while i may not see the utility of literal new lines i heartily concur with not having to remember to escape other characters (or recall yet another escape sequence syntax) which gets especially messy instantiating a Regexp without using literal syntax (for instance one that was built dynamically by concatenating strings).

this seems be a low bar to implement given worst case it could be simply macro processed to its current notation. i love working with es and look forward to it becoming continually more enjoyable even if some features get deferred to es5. i would also like to mention i emailed previously regarding a function/method interceptor/sequence syntax but didn't consider until the following day it would also benefit from proper tail calls which are already being pursuing and would alleviate painfully deep stack traces from methods with long inheritance chains.

~jon

# Jonathan Toland (16 years ago)

i apologize my previous email got sent as plain text and failed to line wrap:

i've been reading some past posts and wanted to add some supplemental input on this one and emphasize some key advantages to mirroring e4x templating syntax in triple quoted strings. first currently beginning to work in as3 i highly regard the progress the wg has made. es4 really has grown up into a very structured (yet still almost uniquely flexible, extensible, and dynamic) language in comparison to the tediousness of trying to scale up my js apps.

  1. i have often missed string interpolation working in js although i have never missed php's (and others?) multiline strings. i don't mind chalking that up to inexperience and an unhealthy fondness for perl but i do feel string interpolation is arguably second only to regex (which i also love) in perl's historical success which both php and more effectlively ruby (with a syntax like e4x) mimic.

  2. as the poster below alluded to as3 already provides string interpolationde facto via e4x. anywhere a string is expected i can substitute a XML literal like <>Today is {new Date.toDateString()} and the time is {new Date.toTimeString()}.</> ver batim. this does identify a few weaknesses however:

first there is a great deal of overhead (creating the XML including the text node as child XML, translating xml entities, recognizing the type context to call toString(), forwarding toString() to the text node, and finally translating the entities back) making it potentially inefficient for any long running loop. the vm could optimize for this but why have to?

also it seems inconsistent to define the previous content with that facility and have to redefine it normally: //assuming xml was assigned the previous example now reassign its content xml='The time is now '+new Date.toTimeString()+'.'; rather than the following possible employing string interpolation: xml="""The time is now {new Date.toTimeString()}.""";

i won't contend that it's not largely syntactic sugar but the utility of string interpolation in other languages especially server side is hard to argue. in original responses to this post it was also alluded to that exactly three characters must be escaped using e4x this way (<, {, and }, while others are handled by the vm) which i will address next.

  1. using triple quoted strings for string interpolation would not only negate not having to escape new lines but any escape sequence syntax at all (including for backslashes). literal delimiters could simply be included between braces as traditional strings e.g.: """this {'example """is""" NOT\xA9 {utilitarian} per'} se""" while i may not see the utility of literal new lines i heartily concur with not having to remember to escape other characters (or recall yet another escape sequence syntax) which gets especially messy instantiating a Regexp without using literal syntax (for instance one that was built dynamically by concatenating strings).

this seems be a low bar to implement given worst case it could be simply macro processed to its current notation. i love working with es and look forward to it becoming continually more enjoyable even if some features get deferred to es5. i would also like to mention i emailed previously regarding a function/method interceptor/sequence syntax but didn't consider until the following day it would also benefit from proper tail calls which are already being pursuing and would alleviate painfully deep stack traces from methods with long inheritance chains.

~jon