using SML for the ES4 spec

# Stephen Weeks (19 years ago)

I have some SML code for manipulating Javascript that I wrote a while back which y'all might find useful. The code follows the ES3 spec and includes:

  • An ML lex specification for Javascript tokens.
  • An ML yacc specification for Javascript.
  • A hand-crafted top-down-parser generator.
  • A specification of Javascript's grammar that works with the top-down-parser generator.
  • Datatypes for Javascript tokens and abstract syntax trees.
  • A command-line tool for Javascript tokenization, parsing, and pretty printing.

The ML yacc parser works except that it doesn't handle semicolon insertion -- I couldn't figure out any way to do that, so I wrote my own top-down-parser generator that has an interface for token insertion. That one works fine. I've tested it on a fair bit of Javascript code on the web and there are no bugs that I know of.

The code is available under the MLton license

mlton.org/License

which is a BSD-style license, so I hope would be OK for you.

Let me know if the code sounds useful and I will package it up.

# Graydon Hoare (19 years ago)

Stephen Weeks wrote:

I have some SML code for manipulating Javascript that I wrote a while back which y'all might find useful. The code follows the ES3 spec and includes:

  • An ML lex specification for Javascript tokens.
  • An ML yacc specification for Javascript.
  • A hand-crafted top-down-parser generator.
  • A specification of Javascript's grammar that works with the top-down-parser generator.
  • Datatypes for Javascript tokens and abstract syntax trees.
  • A command-line tool for Javascript tokenization, parsing, and pretty printing.

I think that would be useful, yes. I'm curious what you did for Unicode in particular. We have constructed some subset of this material for ES4 in the past week, but there is still much to do. It would be nice to have something to compare with, at least.

# Stephen Weeks (19 years ago)

I think that would be useful, yes. I'm curious what you did for Unicode in particular. We have constructed some subset of this material for ES4 in the past week, but there is still much to do. It would be nice to have something to compare with, at least.

I've put the code in the MLton SVN repository at

svn://mlton.org/mltonlib/trunk/com/entain/javascript/unstable

It's browseable on the web at

mlton.org/cgi-bin/viewsvn.cgi/mltonlib/trunk/com/entain/javascript/unstable

The only thing I did to handle Unicode was make the internal representation of strings a vector of 32-bit words (and have the lexer handle \u and \U escapes of course). See the "String" structure in javascript.fun.