SES Progress (was: Fwd: AST in JSON format)

# Mark S. Miller (16 years ago)

[Resending, in order to change the subject. Please reply to this one.]

On Tue, Dec 8, 2009 at 8:51 PM, Mark S. Miller <erights at google.com> wrote:

On Tue, Dec 8, 2009 at 7:59 PM, Oliver Hunt <oliver at apple.com> wrote:

Providing an AST doesn't get you anything substantial here as the hard part of all this is validation, not parsing.

Given ES5 as a starting point,

  1. validation for many interesting purposes, especially security, is no longer hard,
  2. the subset restrictions need no longer be severe, and
  3. the issue isn't what's hard but what's slow and large. Lexing and parsing JS accurately is slow. Accurate JS lexers and parsers are large. Even if JS is now fast enough to write a parser competitive with the one built into the browsers, this parser would itself need to be downloaded per frame. Even if all downloads of the parser code hit on the browser's cache, the parser would still need to be parsed per frame that needed it (unless browsers cache a frame-independent parsed representation of JS scripts).

I am currently working on just such a validator and safe execution environment -- assuming ES5 and a built in parser->AST. Going out on a limb, I expect it to have a small download, a simple translation, no appreciable code expansion, and no appreciable runtime overhead. Once I've posted it, we can reexamine my claims above against it.

Work in progress is at < code.google.com/p/es-lab/source/browse/trunk/src/ses>.

This SES implementation is not actually quite complete yet. Even once it seems complete, we can't test it until there is an available ES5 implementation we can try running it on. However, it is complete enough that we're more confident about what it will be, once bugs are fixed, that we can try assessing the limb I climbed out on above. Though, until it is tested, all this should still be taken with some salt.

  • SES can all be implemented in any ES5 implementation satisfying a few additional constraints that we're trying to accumulate at < code.google.com/p/es-lab/wiki/SecureableES5>.

  • The implementation sketch shown there depends on two elements not currently provided by ES5 or the browser:

    • A Parser->AST API, for which Tom wrote an OMeta/JS parser at <

code.google.com/p/es-lab/source/browse/trunk/src/parser/es5parser.ojs>

that does run in current JavaScript, producing the ASTs described at < code.google.com/p/es-lab/wiki/JsonMLASTFormat> and available at <

es-lab.googlecode.com/svn/trunk/site/esparser/index.html>.

  • An object-identity-based key/value table, such as the EphemeronTables from the weak-pointer strawman. Assuming tables made the SES runtime initialization a touch easier to write. But I can (and probably should) refactor the SES runtime initialization so that it does not use such a table, just to see how adequate ES5 itself already is at supporting SES.

  • Given a parser, the rest of the SES startup is indeed small and fast. For an SES in which the parser needs to be provided in JS, the parser will dominate the size of the SES runtime library. The SES verifier is really trivial by comparison with any accurate JS parser.

  • We are enumerating the subset restrictions imposed by SES at < code.google.com/p/es-lab/wiki/SecureEcmaScript>. For JS code that is

already written to widely accepted best practice, such as no monkey patching of primordials, I would guess these restrictions to be quite reasonable. This is the area where we need the most feedback -- is there any less restrictive or more pleasant object-capability subset of ES5 than the one described here?

  • Because this SES implementation is verification-based, not translation-based, there is no code expansion.

  • SES does no blacklisting and no runtime whitelisting. It does all its whitelisting only at initialization and verification time.

  • Aside from "eval", "Function", "RegExp.prototype.exec", "RegExp.prototype.test", and initialization of the SES runtime (which, given a built-in parser, should be fast), SES has no runtime overhead. This also applies to "eval" and "Function" themselves. All their overhead is in starting up. Given a fast parser, this startup overhead should be small. After startup, code run by either eval or Function has no remaining runtime overhead.