Implicitly escaped $ (or not) in quasis?

# Allen Wirfs-Brock (12 years ago)

I'm working on incorporating quasis into the ES6 draft and there is an issue I want to discuss:

In the wiki proposal[1] $ is used as the prefix for substitutions that may be of two forms: xyz$foo 1234 //$foo substitues the value of the variable foo xyz${foo} 1234 ${expr} generally substitues the result of evaluating expr, so ${foo} substitutes the value of foo

Both of the above examples will produce the same result.

The wiki grammar has this production for dealing with $:

LiteralCharacter :: $ [lookahead ∉ {, IdentifierStart ]

In other words, a $ is a literal part of the quasi text if it that is not immediately followed by a { or an IdentifierStart character.

IdentifierStart includes lots of different things including $ itself and many non-roman characters.

For example, under these rules can you (our human readers) identify which of the following is intended to be literal text and which contains a variable substitution: I say: $ᐭ // ᐭ is U+142B I say: $⏅ // ⏅ is U+23C5

Or, perhaps more routinely:

$1234 // this is all literal text $$1234 //this is a substitution for the variable $1234

I can think of two alternative way to eliminate these potentially confusing situations:

  1. Eliminate the literal use of $ entirely. The valid uses then are either ${expr} or $IndentifierName. Any other occurrence of $ within the literal part of a Quasi would have to be escaped. eg: $99.95 //syntax error \$99.95 // same as: "$99.95"

  2. Eliminate the $IdentifierName form entirely. Use ${IdentifierName} instead. Any occurrence of $ not followed by { is a literal $. eg: $99.95 // same as: "$99.95" $$1234 //same as "$$1234" $foo //same as "$foo" ${foo} //substitutes the value of the variable foo ${ //syntax error

Of these two alternatives, I favor the second over the first and over the current wiki specification. I think alternative 2 makes the overall language simpler for users to learn and to read as it has only one clearly delimitated form of substitution. It eliminates the need to learn and recognize two different forms and the possibility of confusion when non ascii characters are being used. It also allows $ to be used literally, which I suspect is quite common in some locales. This alternative also is simpler to specify and requires fewer parsing irregularities.

There are a couple down sides I see for alternative 2, relative to alternative one. It means that the most common form for substitution expression ( a single identifier) requires two more characters. It also may cause some confusion for people who are used to languages that support $identifier substitution syntax in strings. Personally, I think the advantages of this approach out weight these disadvantages. Other my disagree.

So, I propose that we go with alternative 2. Thoughts?

Allen

[1] harmony:quasis#literalportion

# Mark S. Miller (12 years ago)

I agree that the current implicit literal semantics is confusing, and that these are the two sensible alternatives. In E I chose alternative #1 with a somewhat different escaping syntax. However, now that you point it out, I see the advantages of #2. I think I now prefer #2 but can live with either.

Btw, as long as we're discussing this, let's re-raise what I consider the more important syntactic issue: In the curly form, we should allow any valid JS expression between the curlies. Last time it seemed we had agreement on everything except how to specify grammar. From my experiments with trying quasis, I think this is a crucial usability issue

# Allen Wirfs-Brock (12 years ago)

On Jun 26, 2012, at 10:45 AM, Mark S. Miller wrote:

Hi Allen, I agree that the current implicit literal semantics is confusing, and that these are the two sensible alternatives. In E I chose alternative #1 with a somewhat different escaping syntax. However, now that you point it out, I see the advantages of #2. I think I now prefer #2 but can live with either.

Btw, as long as we're discussing this, let's re-raise what I consider the more important syntactic issue: In the curly form, we should allow any valid JS expression between the curlies. Last time it seemed we had agreement on everything except how to specify grammar. From my experiments with trying quasis, I think this is a crucial usability issue

I'm working

# Erik Arvidsson (12 years ago)

On Tue, Jun 26, 2012 at 9:19 AM, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote:

So, I propose that we go with alternative 2.  Thoughts?

It makes me sad to not support $foo. Would it be too confusing to only support a subset of the identifiers in this form? FWIW, Dart does not allow identifiers with $ in this form, and requires you to use ${ jqueryLikes$Signs } when you need "strange" identifiers. www.dartlang.org/docs/spec/latest/dart-language-specification.html#h.us5hu2wpthk4

On the other hand, CoffeeScript only allows #{ expr } so maybe it is not so bad after all.

# Dio Synodinos (12 years ago)

On Tue, Jun 26, 2012 at 2:48 PM, Erik Arvidsson <erik.arvidsson at gmail.com>wrote:

On Tue, Jun 26, 2012 at 9:19 AM, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote:

So, I propose that we go with alternative 2. Thoughts?

It makes me sad to not support $foo. Would it be too confusing to only support a subset of the identifiers in this form? FWIW, Dart does not allow identifiers with $ in this form, and requires you to use ${ jqueryLikes$Signs } when you need "strange" identifiers.

www.dartlang.org/docs/spec/latest/dart-language-specification.html#h.us5hu2wpthk4

I'd have to agree with Erik and say that having EcmaScript not support $identifier substitution as most other languages and frameworks I'm familiar with do, would feel awkward. Especially if it means using 2 additional characters, just to protect against edge-cases.

# gaz Heyes (12 years ago)

On 26 June 2012 17:19, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote:

I'm working on incorporating quasis into the ES6 draft and there is an issue I want to discuss:

In the wiki proposal[1] $ is used as the prefix for substitutions that may be of two forms: xyz$foo 1234 //$foo substitues the value of the variable foo xyz${foo} 1234 ${expr} generally substitues the result of evaluating expr, so ${foo} substitutes the value of foo

I have to say I disagree with the whole feature, this will introduce a new class of DOM based XSS attacks since developers in their infinite wisdom will use this feature to place user input inside multi-line strings. e.g. message = USER_INPUT and the attack being ${globalVariable}. A list of variable substitutions would mitigate that risk like how the printf function works but allowing any variable reference is a bad idea IMO. I would also like to see how the context aware escaping would work since in order to provide such a mechanism you would have to render the content at some point and the context could change and the user input could change when the content is rendered. The fact that CSS doesn't provide any way to safely escape user input in property names/values without fully white listing the whole specification I fail to see how a context aware escaping would work in that instance.

# Brendan Eich (12 years ago)

gaz Heyes wrote:

On 26 June 2012 17:19, Allen Wirfs-Brock <allen at wirfs-brock.com <mailto:allen at wirfs-brock.com>> wrote:

I'm working on incorporating quasis into the ES6 draft and there
is an issue I want to discuss:

In the wiki proposal[1]  $  is used as the prefix for
substitutions that may be of two forms:
   `xyz$foo 1234`      //$foo substitues the value of the variable foo
   `xyz${foo} 1234`    ${expr} generally substitues the result of
evaluating expr, so ${foo} substitutes the value of foo

I have to say I disagree with the whole feature, this will introduce a new class of DOM based XSS attacks since developers in their infinite wisdom will use this feature to place user input inside multi-line strings. e.g. message = USER_INPUT and the attack being ${globalVariable}.

This isn't much harder today with the fine + operator.

What quasiliterals enable is

safehtml... ${no_fear_here} ...

instead of

safehtml_but_slow("... ${no_fear_here} ...")

which has to parse out the ${} bits at runtime.

Any way you program, whether + or an API built on it or ES6 quasis, users have to remember to sanitize.

A list of variable substitutions would mitigate that risk like how the printf function works but allowing any variable reference is a bad idea IMO.

What's the difference between

lit1 ${exp1} lit2 ${exp2} lit3

and

sprintf("lit1 %s lit2 %s lit3", exp1, exp2)

? Again, other than failing to have compile-time help checking that number of format specifiers matches number of trailing arguments (eliminated in quasis by embedding expN)?

I would also like to see how the context aware escaping would work since in order to provide such a mechanism you would have to render the content at some point and the context could change and the user input could change when the content is rendered. The fact that CSS doesn't provide any way to safely escape user input in property names/values without fully white listing the whole specification I fail to see how a context aware escaping would work in that instance.

code.google.com/p/js-quasis-libraries-and-repl/source/browse/trunk/js/safehtml.js?r=137

# gaz Heyes (12 years ago)

On 27 June 2012 10:06, Brendan Eich <brendan at mozilla.org> wrote:

What's the difference between

lit1 ${exp1} lit2 ${exp2} lit3

and

sprintf("lit1 %s lit2 %s lit3", exp1, exp2)

A list of variables would have to appear outside the backticks somehow like the earlier example using a function call. If not even context aware text could be used to expose variables and dom objects on the page if the developer allows content inside backticks. A developer will assume that a backtick is just another way to declare strings across multiple lines and will probably (in most cases) account for escaping backticks but will fail to account for variables being used inside backticks.

Another thing to consider is in server side languages such as PHP backticks is an eval like construct and if a dev misplaces the backticks then instead of XSS they will have remote code execution. Also in IE a backtick is a valid attribute quote this would introduce new XSS vectors by reusing the existing backticks with an injection.

# Brendan Eich (12 years ago)

gaz Heyes wrote:

On 27 June 2012 10:06, Brendan Eich <brendan at mozilla.org <mailto:brendan at mozilla.org>> wrote:

What's the difference between

 `lit1 ${exp1} lit2 ${exp2} lit3`

and

 sprintf("lit1 %s lit2 %s lit3", exp1, exp2)

A list of variables would have to appear outside the backticks somehow like the earlier example using a function call. If not even context aware text could be used to expose variables and dom objects on the page if the developer allows content inside backticks. A developer will assume that a backtick is just another way to declare strings across multiple lines and will probably (in most cases) account for escaping backticks but will fail to account for variables being used inside backticks.

You assume a developer will assume something. We need evidence.

Lots of languages, e.g. CoffeeScript after Ruby, or bash after the Bourne shell (sh), use embedded expressions in ${...} or #{...} brackets in distinguished string (e.g., double-quoted strings).

These languages don't obviously have more injection attacks based on failure to sanitize than languages with printf-style format strings. Indeed the mismatch problem makes the latter actually unsafe (even memory-unsafe) in too many languages.

Another thing to consider is in server side languages such as PHP backticks is an eval like construct and if a dev misplaces the backticks then instead of XSS they will have remote code execution.

Yes, that's a drag. We lack good options that anyone can type, though. If I recall correctly, an earlier proposal used

format "..."

with format a contextual keyword. In that case one could even switch from embedded ${...} expressions to printf-style trailing arguments, and still have static checking that format specifier and trailing argument counts agree. But then we don't get multiline strings, and the minimal escape interpretation of quasis would be unexpected in anything double (or single) quoted.

Also: PHP, really? Let's not cross the streams and degrade JS syntax just in case. We would need evidence more than the hypothetical risk you cite (I appreciate that you wrote "Another thing to consider", instead of calling this an actual problem -- if you have evidence, please lay it out here).

Also in IE a backtick is a valid attribute quote this would introduce new XSS vectors by reusing the existing backticks with an injection.

Insane. What version(s) of IE? You mean in HTML? That's not standard, of course it never was but with HTML5 and new IE releases, is this still supported?

# gaz Heyes (12 years ago)

On 27 June 2012 12:38, Brendan Eich <brendan at mozilla.org> wrote:

You assume a developer will assume something. We need evidence.

Lots of languages, e.g. CoffeeScript after Ruby, or bash after the Bourne shell (sh), use embedded expressions in ${...} or #{...} brackets in distinguished string (e.g., double-quoted strings).

These languages don't obviously have more injection attacks based on failure to sanitize than languages with printf-style format strings. Indeed the mismatch problem makes the latter actually unsafe (even memory-unsafe) in too many languages.

I don't know how to provide evidence on a feature that doesn't exist yet but here goes: < x10hosting.com/forums/scripts-3rd-party-apps-programming/70485-send-multiline-php-variable-javascript.html#post_401282>

This has a XSS hole obviously but a dev wanted a multiline string from php to JavaScript. Same thing could happen with this feature but if the php variable was escaped correctly in the code sample to escape backticks then it would still contain a XSS hole using a variable reference ${}.

Another thing to consider is in server side languages such as PHP backticks is an eval like construct and if a dev misplaces the backticks then instead of XSS they will have remote code execution.

Yes, that's a drag. We lack good options that anyone can type, though. If I recall correctly, an earlier proposal used

format "..."

I would prefer that syntax.

Also in IE a backtick is a valid attribute quote this would introduce new

XSS vectors by reusing the existing backticks with an injection.

Insane. What version(s) of IE? You mean in HTML? That's not standard, of course it never was but with HTML5 and new IE releases, is this still supported?

<= IE9 and supported in IE10 using compat mode. You can also force a web page into compat mode using a parent web page and a child iframe of a target page.

# Allen Wirfs-Brock (12 years ago)

On Jun 27, 2012, at 1:49 AM, gaz Heyes wrote:

On 26 June 2012 17:19, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote: I'm working on incorporating quasis into the ES6 draft and there is an issue I want to discuss:

In the wiki proposal[1] $ is used as the prefix for substitutions that may be of two forms: xyz$foo 1234 //$foo substitues the value of the variable foo xyz${foo} 1234 ${expr} generally substitues the result of evaluating expr, so ${foo} substitutes the value of foo

I have to say I disagree with the whole feature, this will introduce a new class of DOM based XSS attacks since developers in their infinite wisdom will use this feature to place user input inside multi-line strings. e.g. message = USER_INPUT and the attack being ${globalVariable}. A list of variable substitutions would mitigate that risk like how the printf function works but allowing any variable reference is a bad idea IMO. I would also like to see how the context aware escaping would work since in order to provide such a mechanism you would have to render the content at some point and the context could change and the user input could change when the content is rendered. The fact that CSS doesn't provide any way to safely escape user input in property names/values without fully white listing the whole specification I fail to see how a context aware escaping would work in that instance.

I don't see why the above issue would be a problem with this quasi proposal, as quasi do no implicit evals or implicit reevaluation of substitution values.

Consider this code:

var USER_INPUT = getUserInput(); // assume the value returned is "${globalVariable}"

var message = USER_INPUT; //The value of message is the string "USER_INPUT", no substitution occurred

var messageWithSub = ${USER_INPUT}; //The value of messageWithSub is the string "${globalVariable}", literally. No eval is performed.

The code would have to explicitly say something like:

eval(${USER_INPUT}); //this means the same as eval("${globalVariable}") and will produce a syntax error

for the attach to be executed.

The other situations would be a quasi with an explicit substitution handler:

var messageWithMSGSub = Msg${USER_INPUT}; //The value of messageWithMsgSub is the result of calling Msg.

Here, Msg, is a application or library provided quasi substitution handler function. Think about it as the DSL compiler. In this case Msg might be designed to do an explicit eval of the the substitution text containing the attach code. However, that fact that Msg evals some of its inputs really should be documented as part of the its contract. It essentially is just like any other function you might call and pass a string to as an argument. If you don't know or don't trust the function to not do an eval of its parameters then you better not pass any strings into it that originated from untrusted sources.

The quasi proposal[1] looks quite extensive at the security implications of this feature and a number of mitigations are included in the design. Any body with security concerns involving quasi should probably start with that proposal and then address where it is wrong in its conclusions or otherwise falls short.

Allen

[1] harmony:quasis

# gaz Heyes (12 years ago)

On 27 June 2012 15:59, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote:

var messageWithMSGSub = Msg${USER_INPUT}; //The value of messageWithMsgSub is the result of calling Msg.

Here, Msg, is a application or library provided quasi substitution handler function. Think about it as the DSL compiler. In this case Msg might be designed to do an explicit eval of the the substitution text containing the attach code. However, that fact that Msg evals some of its inputs really should be documented as part of the its contract. It essentially is just like any other function you might call and pass a string to as an argument. If you don't know or don't trust the function to not do an eval of its parameters then you better not pass any strings into it that originated from untrusted sources.

Ah sorry I misunderstood the syntax, I've also tried the demo but it isn't clear to me how placeholders are assigned. Please could someone explain how to pass the correct variable to a placeholder?

# gaz Heyes (12 years ago)

On 27 June 2012 15:59, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote:

I don't see why the above issue would be a problem with this quasi

proposal, as quasi do no implicit evals or implicit reevaluation of substitution values.

Consider this code:

var USER_INPUT = getUserInput(); // assume the value returned is "${globalVariable}"

var message = USER_INPUT; //The value of message is the string "USER_INPUT", no substitution occurred

var messageWithSub = ${USER_INPUT}; //The value of messageWithSub is

the string "${globalVariable}", literally. No eval is performed.

I understand the syntax now and I was correct with my initial assumptions. Although eval isn't performed on the placeholder text, you can access variables from outside the scope intended. For example:

!function(){ var cookie=document.cookie, x =1; func${cookie}; }();

If an injection occurs within the Quasi-Literal then you can use unintended variables because there is no strict definition of which variables substitution should occur. I also wonder if the syntax is extended to support access object properties if this is a further security risk.

!function(){ var x =1; //intended to use this variable func${arguments.callee.caller()}; func${arguments[0]}; }();

It seems to me this is similar to having variable access inside string literals and presents a real security risk even when a developer escapes a quasi literal correctly.

# Brendan Eich (12 years ago)

gaz Heyes wrote:

On 27 June 2012 15:59, Allen Wirfs-Brock <allen at wirfs-brock.com <mailto:allen at wirfs-brock.com>> wrote:

I don't see why the above issue would be a problem with this quasi proposal, as quasi do no implicit evals or implicit reevaluation of substitution values.

Consider this code:

var USER_INPUT = getUserInput(); // assume the value returned is "${globalVariable}"

var message = USER_INPUT; //The value of message is the string "USER_INPUT", no substitution occurred

var messageWithSub = ${USER_INPUT}; //The value of messageWithSub is the string "${globalVariable}", literally. No eval is performed.

I understand the syntax now and I was correct with my initial assumptions.

Ok, I knew you knew what I thought you knew ;-).

Although eval isn't performed on the placeholder text, you can access variables from outside the scope intended. For example:

!function(){ var cookie=document.cookie, x =1; func${cookie}; }();

If an injection occurs within the Quasi-Literal then you can use unintended variables because there is no strict definition of which variables substitution should occur. I also wonder if the syntax is extended to support access object properties if this is a further security risk.

!function(){ var x =1; //intended to use this variable func${arguments.callee.caller()}; func${arguments[0]}; }();

It seems to me this is similar to having variable access inside string literals and presents a real security risk even when a developer escapes a quasi literal correctly.

Let's define risk:

risk = probability * cost

Assume fixed high cost of injection attack (they're bad).

Probability, or observed likelihood, is non-zero in JS today. People write string-concatenation-based formatting APIs, they call them with unsanitized arguments that interpolate, they're pwned.

Does adding quasiliterals change this probability appreciably?

Maybe it does raise probability for the unprefixed case, but the concatenation-based APIs still exist. You could argue that SELECT * FROM Users WHERE Passwd = '${pwn_me}' is an attractive nuisance.

On the other hand, safesqlSELECT * FROM Users WHERE Passwd = '${cant_pwn_me}' support and developer evangelization means we have an affordance beyond library code to reduce the probability of injection attacks.

Hard to say more without empirical studies, but I see plus and minus and can't jump to the conclusion that risk goes up just from the design of quasis.

It's true one could evangelize a sanitizing API like the Caja one on which quasis were based. But developers fail to use such APIs. Several reasons I see:

  1. They have to procure and load a library (true for safesql in the quasis case too, unless we standardize some functions -- which we could).

  2. They have to pay the price of calling the API.

  3. The API has overhead in parsing the format string.

Quasis address 2 in part, and 3 to a large extent if not completely (by parsing literal portions out of the format string in the JS compiler).

Quasis have other virtues as noted: multiline strings and regexps, even without any interpolation via ${...}.

Martin Johns among others has researched injection vulnerabilities. His SAP research page: www.martinjohns.com

His old University of Passau home page: web.sec.uni-passau.de/members/martin/index.php

Technical report: web.sec.uni-passau.de/members/martin/2007-MJ-TechReport_279.pdf

Slide deck I recall from Dagstuhl 09141, and relevant to this thread: web.sec.uni-passau.de/members/martin/talks/081215_MSR.pdf

This deck is about "Language-base prevention of code injection vulnerabilities". It is not all relevant to quasis, but some of the goals and especially the conclusions are:

  • reliably [prevent] string-based code injection,
  • can be used for existing languages/frameworks/servers,
  • [allow] integration of the complete foreign syntax,
  • [preserve] (most of) the string-type conventions,
  • and is applicable for all foreign language types
  • Query, mark-up, general purpose, hybrid, …

There's no free lunch in defending against injection attacks. As Martin notes, you can add cumbersome constructive APIs, but practically no one will use them. You can go hog-wild with language integration, e.g. LINQ, E4X; but these are very costly to standardize. You can support "pre-processing", but that is a build step.

Quasis are trying to be an in-language programmable "pre-processor", so no build step. Instead you need to prefix the ... with safesql where that identifier resolves (just like any identifier expression) to a function that takes the pre-processed raw and cooked literal portions, and the expressions to interpolate in between each literal portion.

Will programmers use quasis such as safesql and safehtml, where today they do not procure and use the library equivalents? Maybe, because the syntax is better due to in-order literal and expression portions, and the JS implementation does the raw/cooked string parsing and escaping.

It's hard to do more, in-language (again, LINQ, E4X -- just say no). Doing nothing leaves us as vulnerable as today. Unprefixed quasis could add likelihood of vulnerability (your concern).

This is not a topic with a clean, low-cost-lunch solution. Developers have to care, and take extra steps of some kind. On this basis, I think quasis are a net win, even excluding the multiline string benefit.

I don't agree that interpolation is riskier than concatenation, not without some evidence (perhaps there's been a study?).

Research tips welcome, I didn't look for other research than Martin's, with which I was already familiar. Cc'ing Mike Samuel in case he is behind on es-discuss, and Mark Miller.