Greedy triple-quoted string literals
The correct interpretation is that a triple quoted string starts with three quotes of the same kind and ends when the same three quotes are seen in sequence provided that the character following the three is not that same quote character.
(Whether you want to call that greedy or not depends on whether you think it's greedy to take what you can when you can take it, or whether you think it's greedy to avoid taking something now because you think you can take more later. Generally speaking a greedy algorithm is one that makes a locally optimal choice about what is best, so I think the description on the wiki is OK. The above description is more precise.)
On 18/02/2008, Lars T Hansen <lth at acm.org> wrote:
The correct interpretation is that a triple quoted string starts with three quotes of the same kind and ends when the same three quotes are seen in sequence provided that the character following the three is not that same quote character.
(Whether you want to call that greedy or not depends on whether you think it's greedy to take what you can when you can take it, or whether you think it's greedy to avoid taking something now because you think you can take more later. Generally speaking a greedy algorithm is one that makes a locally optimal choice about what is best, so I think the description on the wiki is OK. The above description is more precise.)
I think that's a confusion waiting to happen that I certainly wouldn't want to see in spec text. The concept of "greedy" matching that I have would mean that the last triple quote delimiter in the source would be considered to be the endpoint of the literal that started with the first triple quote delimiter in the source.
-----Original Message----- From: es4-discuss-bounces at mozilla.org [mailto:es4-discuss-bounces at mozilla.org] On Behalf Of liorean Sent: 18. februar 2008 11:53 To: es4-discuss at mozilla.org Subject: Re: Greedy triple-quoted string literals
On 18/02/2008, Lars T Hansen <lth at acm.org> wrote:
The correct interpretation is that a triple quoted string starts with three quotes of the same kind and ends when the same three quotes are seen in sequence provided that the character following the three is not that same quote character.
(Whether you want to call that greedy or not depends on whether you think it's greedy to take what you can when you can take it, or whether you think it's greedy to avoid taking something now because you think you can take more later. Generally speaking a greedy algorithm is one that makes a locally optimal choice about what is best, so I think the description on the wiki is OK. The above description is more precise.)
I think that's a confusion waiting to happen that I certainly wouldn't want to see in spec text. The concept of "greedy" matching that I have would mean that the last triple quote delimiter in the source would be considered to be the endpoint of the literal that started with the first triple quote delimiter in the source.
The spec will be more precise than the wiki proposals in all cases, and will necessarily avoid these kinds of terms (except when it defines the terms precisely first). I hope the interpretation that I gave above is sufficiently clear for a spec. It would be possible to provide an operational definition instead, but I hope we can avoid that for the surface syntax.
It's been pretty quiet around here since we debated tail calls... A thread on nomenclature sounds like it could get good.
Triple-quoted string parsing does not stop at the first possible end delimiter; it stops at the first end delimiter where local inspection of the string around the potential termination point makes it possible to conclude that the string probably was not intended to extend further. That is what Comen, Leiserson, and Rivest[*] call greedy in their discussion of greedy algorithms: "A //greedy algorithm// always makes the choice that looks best at the moment. That is, it makes a locally optimal choice in the hope that this choice will lead to a globally optimal solution."
Knuth does not include the term in his index, sadly, nor do any of my other algorithm books. Can somebody dig up a conflicting definition so that we can get a real discussion going?
--lars
[*] 1st edition
On 18/02/2008, Lars T Hansen <lth at acm.org> wrote:
It's been pretty quiet around here since we debated tail calls...
I never finished my part of that discussion... I have a long message half-written on it since how long? Three months ago? I never quite finished my line of thought though.
<digression about tail-calls>
Basically the idea was along the lines of investigating the effects of removing GetValue usage from those productions whose semantics just pass values through (such as all the shortcut evaluation operators, parenthesised expressions, plain assignment, function arguments, return statement etc.). The only semantics that would in effect be changed are those for function calls. Function calls would always "remember" the this-object of any member lookup since they are passed around as a Reference object, while still not making that object outside detectable. Of course if the function is referenced again using member lookup, then a new Reference object would be created with the new base object and the function object itself GetValue'd out of the old Reference object.
This change would eradicate a pet peeve of mine: var o={ f: function(){ return this; } }, f=o.f; o.f(); // => o f(); // => window (o.f)(); // =>o (f=o.f)(); // => window
With the change, all of those would return o.
I never quite finished my analysis of the backwards compatibility and security implications of doing such a change though. For backwards compatibility the issues with doing such a change should be minor in live code. It would only affect code that both expects the this-object to be the global object and which extracts the function from an object using shortcut evaluation or assignment operation.
For security, there's somewhat greater implications. IF the function cooperates (and ONLY in that case) the this-value could be extracted through return value or assignment to a scoped variable. This can only happen if the function itself either returns the this-object or assigns it to an external variable. </digression about tail-calls>
That is what Comen, Leiserson, and Rivest[*] call greedy in their discussion of greedy algorithms: "A //greedy algorithm// always makes the choice that looks best at the moment. That is, it makes a locally optimal choice in the hope that this choice will lead to a globally optimal solution."
Knuth does not include the term in his index, sadly, nor do any of my other algorithm books. Can somebody dig up a conflicting definition so that we can get a real discussion going?
When discussing "greedy" and "lazy" in terms of quantifierrs in regex, the usual way to talk about them is that out of multiple valid matches, "greedy" choses the match containing as many repetitions as possible, and "lazy" choses the match containing as few repetitions as possible. I don't know of a text that has a more formal definition than that, really, nor do I know of any definition of greedy/lazy algorithms as opposed to greedy/lazy quantifiers in regex or grammars. I've got no formal CS education though, so I've not read that much of the literature...
-----Original Message----- From: es4-discuss-bounces at mozilla.org [mailto:es4-discuss-bounces at mozilla.org] On Behalf Of liorean Sent: 18. februar 2008 22:18 To: es4-discuss at mozilla.org Subject: Re: Greedy triple-quoted string literals
That is what Comen, Leiserson, and Rivest[*] call greedy in their discussion of greedy algorithms: "A //greedy algorithm// always
makes
the choice that looks best at the moment. That is, it makes a
locally
optimal choice in the hope that this choice will lead to a globally optimal solution."
Knuth does not include the term in his index, sadly, nor do any of
my
other algorithm books. Can somebody dig up a conflicting definition
so that we can get a real discussion going?
When discussing "greedy" and "lazy" in terms of quantifierrs in regex, the usual way to talk about them is that out of multiple valid matches, "greedy" choses the match containing as many repetitions as possible, and "lazy" choses the match containing as few repetitions as possible. I don't know of a text that has a more formal definition than that, really, nor do I know of any definition of greedy/lazy algorithms as opposed to greedy/lazy quantifiers in regex or grammars. I've got no formal CS education though, so I've not read that much of the literature...
Good point! Using triple-quoted strings themselves as the example, the "greedy" regex
""".*"""(?!")
allows only one triple-quoted string per file, whereas the "non-greedy" regex
""".*?"""(?!")
is actually the correct one (modulo newlines and escape sequences.)
Neither is deterministic, though; something like
"""(?:(?!"""(?!")).)*"""
is probably more like it (haven't tested). Note that that would be a greedy regex (greedy in the regex sense) implementing a greedy algorithm (greedy in the algorithm sense).
On Feb 18, 2008 1:17 PM, liorean <liorean at gmail.com> wrote:
Basically the idea was along the lines of investigating the effects of removing GetValue usage from those productions whose semantics just pass values through (such as all the shortcut evaluation operators, parenthesised expressions, plain assignment, function arguments, return statement etc.). The only semantics that would in effect be changed are those for function calls. Function calls would always "remember" the this-object of any member lookup since they are passed around as a Reference object, while still not making that object outside detectable. Of course if the function is referenced again using member lookup, then a new Reference object would be created with the new base object and the function object itself GetValue'd out of the old Reference object.
On 19/02/2008, Garrett Smith <dhtmlkitchen at gmail.com> wrote:
Is this like getValue with a hint for a thisArg?
No, not really. And the way I described it is not quite what I want, either. I'd want something similar to but not exactly like the Reference type, such that a tuple with base and value is stored, not base and name-of-property as is the case for Reference. A Reference would allow replacing the value of the property between initialisation and use, while I'd want something that stores the actual value at the time of initialisation. Using a Reference also would require recursive GetValue, which can be eliminated at initialisation time if it stored the value instead of the name of the property.
This change would eradicate a pet peeve of mine: var o={ f: function(){ return this; } }, f=o.f; o.f(); // => o f(); // => window (o.f)(); // =>o (f=o.f)(); // => window
(o.f)(); // =>o
This should be window.
No it shouldn't. The grouping syntax specifically doesn't call GetValue in order to make delete and typeof operators able to use function-call-like syntax. Which makes up for this asymmetry.
With the change, all of those would return o.
I never quite finished my analysis of the backwards compatibility and security implications of doing such a change though. For backwards compatibility the issues with doing such a change should be minor in live code. It would only affect code that both expects the this-object to be the global object and which extracts the function from an object using shortcut evaluation or assignment operation.
Any event registry using load/onlunload would seem to have problems.
MyWIndowListeners = { onunload : function(){} };
onunload = MyWIndowListeners.onunload;
That's not a problem. It's a question of how the [[Call]] is made on the event handler. If the event implementation is specified to extract the function object, and calls fobj.[[Call]](EventTarget, arguments) when the event is triggered, then it's not a problem at all. The specs for DOM events do NOT specify how the this-object should be handled. Nor do any other relevant spec from what I can tell. At the moment, DOM0 and Saf/Op/Moz send the EventTarget as this-argument. Ie doesn't send it for attachEvent et al, nor do some other DOM2Events implementations (e.g. the one in Tasman). Since the function call is not part of ECMAScript, and this change affects the member lookups and storage both, any external call into ECMAScript that does a GetValue would get the old behaviour and any external call that sends a this-value of it's own would be unaffected. Only an external call that doesn't do a GetValue would get the remembered this-value.
Instance methods in ES4 are bound. One implication to that is that theres a new function for each instance. It's not as cheap as having a prototype method.
Instance methods would ignore any this-value sent, no? So this really wouldn't change anything for them at all.
On Feb 19, 2008 10:21 AM, liorean <liorean at gmail.com> wrote:
On Feb 18, 2008 1:17 PM, liorean <liorean at gmail.com> wrote: On 19/02/2008, Garrett Smith <dhtmlkitchen at gmail.com> wrote
(o.f)(); // =>o
This should be window.
No it shouldn't. The grouping syntax specifically doesn't call GetValue in order to make delete and typeof operators able to use function-call-like syntax. Which makes up for this asymmetry.
Right. Thank you.
On Feb 18, 2008 1:17 PM, liorean <liorean at gmail.com> wrote:
Basically the idea was along the lines of investigating the effects of removing GetValue usage from those productions whose semantics just pass values through (such as all the shortcut evaluation operators, parenthesised expressions, plain assignment, function arguments, return statement etc.). The only semantics that would in effect be changed are those for function calls. Function calls would always "remember" the this-object of any member lookup since they are passed around as a Reference object, while still not making that object outside detectable. Of course if the function is referenced again using member lookup, then a new Reference object would be created with the new base object and the function object itself GetValue'd out of the old Reference object.
On 20/02/2008, Garrett Smith <dhtmlkitchen at gmail.com> wrote:
a.f = function(){ return this; } s.f = a.f
print( f() ) // s
Well, you'd have to do f=s.f first, but you've got the principle right.
a.f ;
Just mentioning it doesn't change anything. The reference is created by mentioning it, but since you're not storing it anywhere it just gets garbage collected. You'd have to do f=a.f, jsut like you had to do f=s.f above.
print( f() ) // a
Is this right?
With the changes above, then yes. The idea is that member lookups for functions create a Reference (or rather a base:value tuple instead of a base:slotname tuple), which is stored. This would be an intermediary internal type that is never actually detectable from the script itself other than in the form of retained this-values when passing function objects around.
On 19/02/2008, Garrett Smith <dhtmlkitchen at gmail.com> wrote:
Is this like getValue with a hint for a thisArg?
On Feb 19, 2008 10:21 AM, liorean <liorean at gmail.com> wrote:
No, not really. And the way I described it is not quite what I want, either. I'd want something similar to but not exactly like the Reference type, such that a tuple with base and value is stored, not base and name-of-property as is the case for Reference. A Reference would allow replacing the value of the property between initialisation and use, while I'd want something that stores the actual value at the time of initialisation. Using a Reference also would require recursive GetValue, which can be eliminated at initialisation time if it stored the value instead of the name of the property.
If I'm understanding you correctly, you want a Reference type on a function. Is this correct?
I want a base:value tuple returned from member lookups if the value of the lookup is a function object. The Reference type is a base:slotname tuple, meaning the actual value lookup would be delayed until use, which is undesirable.
And that Reference has a value pointing to the object which will be the this arg in Call. Right?
Yes.
If the event implementation is specified to extract the function object, and calls fobj.[[Call]](EventTarget, arguments) when the event is triggered, then it's not a problem at all. The specs for DOM events do NOT specify how the this-object should be handled. Nor do any other relevant spec from what I can tell. At the moment, DOM0 and Saf/Op/Moz send the EventTarget as this-argument. Ie doesn't send it for attachEvent et al, nor do some other DOM2Events implementations (e.g. the one in Tasman).
I can't remember the last time I tried to use Mac IE. I remember that it supported neither attachEvent nor addEventListener.
There's newer versions of Tasman. Tasman 0.9 was used in the MSN/OSX browser, which contained DOM support much closer to that of Gecko, Presto and Webkit than to that of Tasman 0.1 or Trident. Tasman 1.0 is also used in Entourage in Office:Mac.
Instance methods would ignore any this-value sent, no? So this really wouldn't change anything for them at all.
Can you explain more?
The way I understands the ES4 proposals, [[Call]] on instance methods ignore the first argument (the this.value) and instead always set the instance as the this-value. So whatever first argument is sent to [[Call]], it will have exactly zero impact on the semantics of the program. Or is this a misunderstanding?
There's a note on the triple-quoted string literals proposal*:
"we decided that triple-quoted strings would be greedy when looking for the closing triple-quotes."
Wouldn't this mean, in practice, that it is only possible to have two triple-quoted string literals per source file (one with """ and another with ''')?