Sandboxing and parsing jQuery in 100ms

# gaz Heyes (13 years ago)

Check this out: www.thespanner.co.uk/2012/11/07/sandboxing-and-parsing-jquery-in-100ms

I have written a complete parser/sandbox based on tracking opening and closing characters (I will be writing a paper on this in more detail) it works by using the opening and closing states to store the current state. For example:

o={ };

Here my parser looks at the state before "{" and labels it as a object literal. I track that state and when a "}" is encountered I can use the balanced nature of javascript to my advantage by looking up the current count of characters which allows me to identify that "}" is in fact a closing object literal. It isn't that simple I here you say and there are complicated cases such as: var xyz, xyz {}

Here my parser takes an additional step using the characters it tracks such as "{" and "(" etc I can use that to uniquely identify the var statement context by using indexes of these characters to know the comma is in fact part of a var statement. Lets look at another example and I'll illustrate with comments how the tracking works.

//C = curly //P = Paren

//C 1 { function x //P1 () //P0 {//C2 var x, y // C2 + P0 = Var statement //C3 { }//C2 ; // C2 + P0 = ''; }//C1 }//C0

By using these reference points no matter how many nested expressions you have you can still know the statement it's part of or the character should be a particular state. Please let me know your comments and I'll post the paper when it's done.

# Peter van der Zee (13 years ago)

On Wed, Nov 7, 2012 at 5:53 PM, gaz Heyes <gazheyes at gmail.com> wrote:

Hi all

Check this out: www.thespanner.co.uk/2012/11/07/sandboxing-and-parsing-jquery-in-100ms

How would you deal with cases like foo(/)/); and foo(5//)/g); ? So how would you deal with combinations of regular expressions and divisions in awkward places? Or are you already using a tokenizer and hardcoded rules on when to parse for regex/div?

# gaz Heyes (13 years ago)

On 7 November 2012 21:37, Peter van der Zee <ecma at qfox.nl> wrote:

How would you deal with cases like foo(/)/); and foo(5//)/g); ? So how would you deal with combinations of regular expressions and divisions in awkward places? Or are you already using a tokenizer and hardcoded rules on when to parse for regex/div?

The first stage I use is to assign a "left" value for example if an expression occurs then I give it a value of true if not then false. If the left value is false then it's a regex. Where it isn't as easy is when you have var statements or need to know what a colon is. Is it a case colon or label or ternary. I decided to hardcode the entire js syntax into a series of very specific rules for example:

var rules = {

ArrayComma:createRule('NewExpressions,Expression,Postfix'),

ArrayOpen:createRule('Statements,Operators,NewExpressions,Prefix'),

ArrayClose:createRule('ArrayComma,ArrayOpen,Expression,Postfix'), .....

The property name is the current state and the values are the previous state. This allows you to control what certain statements do what they are allowed to follow. I tokenize and parse simultaneously.

Both your cases are invalid javascript in the browser. So they will never reach the MentalJS parser because I do a syntax check in the browser first before passing it to MentalJS. However if I turn off this check here is how Mental JS parses foo(/)/);

<Identifier>$foo$</Identifier><FunctionCallOpen>(</FunctionCallOpen><RegExp>/)/</RegExp><FunctionCallClose>)</FunctionCallClose><EndStatement>;</EndStatement>

I could fix your invalid syntax automatically in the regex state machine by rewriting /)/ to /)/ and then it would execute :)

The second example would fail mental js syntax check because the comment removes the paren and so the opening and closing parens are uneven which will result nicely in a syntax error.

# Peter van der Zee (13 years ago)

On Thu, Nov 8, 2012 at 12:05 AM, gaz Heyes <gazheyes at gmail.com> wrote:

Both your cases are invalid javascript in the browser. So they will never

D'oh. I meant escaped parensthesis, didn't think about capturing groups. For the second example, there was supposed to be a space to prevent the line comment.

But that doesn't really matter. I think I've got my answer. Thanks.

# gaz Heyes (13 years ago)

For entertainment purposes I used MentalJS to parse itself. I found that I had trailing commas in a couple of my object literals. Then I decided to execute itself inside the sandbox: eval("js=MentalJS();alert(js.parse('1+1'))");

I added an "inception" button to show how it does this. Parse time moves slower inside dreams..I mean sandboxes.

# gaz Heyes (12 years ago)

I thought I'd share an update of my mental js work. I have since reduced the parse time of mental and now added a DOM sandbox that uses ES5 to allow safe manipulation of the DOM. This is so cool because it means that mental can take control over your dom and then we can choose what we allow. Want to restrict images to same origin? No problem, want to prevent script nodes from the ability to call external resources no problem :)

There's a cool demo on modsecurity where they have an injection hole and inject mental into the response to prevent harmful xss. www.modsecurity.org/demo/demo-deny-noescape.html?test=<script>alert(location)<%2Fscript>

I managed to get the parse time of jQuery to min of 24ms on chrome, on Firefox it can parse and sandbox jQuery in about 90ms although there are a couple of problems with the selectors which I need to debug. Any comments or suggestions are welcome.