SML vs Ocaml for ECMA script spec.
Also some tidbits from.. www.ffconsultancy.com/free/ray_tracer/languages.html who did an independent comparison of a ray-trace benchmark. I've snipped the bits I think are relevant to the discussion. This is the most relevant one:
<snip>
Unlike SML, the OCaml language is not standardised and continues to evolve. The evolution of OCaml has allowed it to adopt many new programming constructs not found in SML, including polymorphic variants and objects. However, the evolution of the OCaml language sometimes renders old code uncompilable or, worse, different in terms of performance or even correctness.
Ray tracer language comparison
/*Update: *Lisp much faster! Non-whitespace character counts!/
We have ported four progressively optimised versions of our cut-down ray tracer www.ffconsultancy.com/free/ray_tracer/comparison.html to C++ gcc.gnu.org, Java www.java.com, OCaml www.ocaml.org and SML www.standardml.org. Joe Marshall, Nathan Baum, Juho Snellman, Jeffrey Siskind and others have kindly ported the programs to Common Lisp and Scheme. Thanks also to Oleg Trott for pointing out an important error in our code.
<snip>
The OCaml and SML implementations of the ray tracer are by far the most succinct. The shortest OCaml implementation of the ray tracer fits in only 55 lines. The longer OCaml and SML implementations are significantly shorter and faster than the shortest implementations in all of the other languages.
<snip>
In addition to examining the fastest implementations, it is interesting to look at the performance of simple implementations. <snip>
In 32-bit, Stalin-compiled Scheme is fastest, followed closely by the more concise MLton-compiled SML program. OCaml, SML/NJ-compiled SML and g++-compiled C++ all give similar performance. The Lisp is not only 30% more verbose than the OCaml but is also 4.9x slower.
Given their performance, it is remarkable that the Stalin and MLton implementations are devoid of low-level optimisations and the C++ and OCaml implementations only had minor design decisions made based upon performance (the use of pass-by-reference in C++ and the representation of vectors as records rather than tuples in OCaml).
Not uncoincidentally, two of the best optimising compilers, Stalin and MLton, are both whole-program optimising compilers, meaning they can only compile whole (self-contained) programs. The C++ and OCaml compilers allow partial recompilation, compile this whole program much more quickly and still achieve very competitive performance.
<snip>
Note that OCaml and SML/NJ compile faster than all of the other languages and Stalin compiles the Scheme implementation three orders of magnitude slower than SML/NJ, taking over 10 minutes to compile this tiny program. <snip>
The last nail in the coffin. I can simply not imagine any organization like ECMA forcing people to use the single GPL encumbered implementation of Ocaml when there are free alternatives already available from multiple vendors. Unless you do the reference implementation without referring to any of the OCaml library, my understanding is that you will not be able to redistribute the full spec without encumbering the IP. I'm sure the FUD around the GPL will slow down the standards process down.
caml.inria.fr/ocaml/license.en.html The Compiler is distributed under the terms of the Q Public License caml.inria.fr/ocaml/license.en.html#qpl version 1.0 with a change to choice of law (included below). The Library is distributed under the terms of the GNU Library General Public License caml.inria.fr/ocaml/license.en.html#lgpl version 2 (included below).
mlton.org/License Software As of 20050812, MLton software is licensed under the BSD-style license below. By contributing code to the project, you agree to release the code under this license. Contributors can retain copyright to their contributions by asserting copyright in their code. Contributors may also add to the list of copyright holders in doc/license/MLton-LICENSE, which appears below.
www.smlnj.org/license.html Yet another BSD style...
Having the spec in an executable form can greatly improve compliance, because you can use the official spec as a test-oracle. Ideally, you want to let people download and run the spec and integrated it into the test process, and perhaps even make modifications to it. The GPL wildcard will just impede this process and reduce the conformance quality of any implementations.
Okay, I've spamed the list enough. I hope I've made a strong case that the right pragmatic decision is to go with SML over Ocaml.
P.S. You guys obviously have enough taste to be thinking of using any ML for the language spec. So, I'm sure after you review all the facts you'll do the right thing again and choose SML.
On Oct 21, 2006, at 2:02 PM, Daniel C. Wang wrote:
P.S. You guys obviously have enough taste to be thinking of using
any ML for the language spec. So, I'm sure after you review all the
facts you'll do the right thing again and choose SML.
Flattery will get you everywhere ;-).
We are agreed now on SML, using SMLNJ. Our flirtation with OCaml was
based on convenience and familiarity, but as I noted in another
message, we were committed to using a small core language, plus
continuation support of some kind (for OCaml this meant a patch; with
SMLNJ, no patch).
Dave will post to LtU soon, I'm sure.
P.S. You guys obviously have enough taste to be thinking of using
any ML for the language spec. So, I'm sure after you review all the
facts you'll do the right thing again and choose SML.Flattery will get you everywhere ;-).
We are agreed now on SML, using SMLNJ. Our flirtation with OCaml was
based on convenience and familiarity, but as I noted in another
message, we were committed to using a small core language, plus
continuation support of some kind (for OCaml this meant a patch; with
SMLNJ, no patch).Dave will post to LtU soon, I'm sure.
Yes, the not-native support of continuations in OCaml would have maybe been a problem. Even if I prefer OCaml - or even better my own NekoML ;)
- any language in the ML family should be enough powerful to express the specification.
This is all great news. I hope you all know what a groundbreaking a step this is in terms of the state of the art in language specification.
I wish you guys all the luck. In one of my pipe dreams someone sits down and uses ML to specify something like the JVM or CLR next. Then of course everyone reads the wonderfully concise specs that they can also execute, and start wondering... why the hell am I programing in Java, C#, or Javascript! :) (okay, I'll admit these days that C# is looking pretty cool too, if it only had datatypes...)
The history of ML was that it was the "scripting" language for a theorem prover. At some point people realized that it was cool and useful on it's own.
I am happy to hear about the decision to use SML to specify ES4 and think it is a good one. I have a few comments.
I don't understand why meta-language continuations are necessary for the spec. If performance really isn't an issue, why not express stacks/continuations as ordinary data structures and do whatever manipulations you want on them directly? It would probably make the spec more clear to not rely on meta-language features like call/cc. Illuminating reading for this kind of thing is Reynold's "Definitional Interpreters for Higher-Order Programming Languages":
mlton.org/References#Reynolds98_2
If you do decide to use meta-language continuations, I suspect that exceptions and threads really are enough. You should consider using (a very small subset of) MLton's thread interface:
MLton.Thread provides access to MLton's user-level thread implementation (i.e. not OS-level threads). Threads are lightweight data structures that represent a paused computation. Runnable threads are threads that will begin or continue computing when switched to. MLton.Thread does not include a default scheduling mechanism, but it can be used to implement both preemptive and non-preemptive threads.
In order to specify ES4's yield, all you probably need is Thread.new and Thread.switch. Importantly, MLton.Thread.switch is constant time, unlike MLton.Cont.throw, which takes time proportional to the size of the stack:
Plus, it is trivial to implement Thread.{new,switch} on top of a constant-time implementation of callcc/throw like SML/NJ has (and not vice versa).
Now, for a few comments on the choice of SML implementation.
SML does have the benefit of a number of active implementations (eight or so):
mlton.org/StandardMLImplementations
However, SML/NJ is the least conforming to the Definition.
mlton.org/DefinitionOfStandardML, mlton.org/SMLNJDeviations
If you want to develop code that matches the Definition and is portable among SML implementations, you should use at least one of the other implementations from the start of the project (Hamlet and MLton being the most pedantic about following the Definition). I can not understate the number of times I have heard people who started a project with SML/NJ encounter porting problems because they unknowingly used some aspect of SML/NJ that is not SML and is not supported by other SML implementations.
One feature missing from SML/NJ that may be relevant in providing an executable spec is exact binary<->decimal conversions. For example,
if you try
Real.fmt StringCvt.EXACT Real.Math.pi
in SML/NJ you will get
uncaught exception Fail [Fail: RealFormat: fmtReal: EXACT not supported]
On the other hand, if you try this in MLton, you will get
0.3141592653589793E1
MLton uses gdtoa for binary<->decimal conversions. I don't know of
any other SML implementation that does.
Another feature missing from SML/NJ, but is present in MLton, the ML Kit, and Poly/ML, is the generation of standalone executables. This could be useful for shipping a binary package containing the executable spec.
I don't understand why meta-language continuations are necessary for the spec. If performance really isn't an issue, why not express stacks/continuations as ordinary data structures and do whatever manipulations you want on them directly? It would probably make the
We are looking for a middle ground between precision and readability. A direct-style interpreter has the benefit of being clearest and requiring no whole-interpreter modifications. Explicit continuations or evaluation contexts risk polluting a lot of the spec with mostly irrelevant detail.
Clearly, as you point out, the dangers in using features of the meta-language to model aspects of the object language are a) bugs due to impedance mismatches between the two and b) misunderstanding due to readers not understanding the implicit features of ML.
I think the benefits outweight the risk: the features of ML that we are using are few, and pretty standard. Exceptions and handlers are, at their core, simple and widely understood, and ML encodes only their very basic features. There's no notion of "finally", for example. To be sure, call/cc is a monster, but as Brendan has been saying we can at least "keep the monster in a box" by only using it in the one place where it's needed, not throughout the spec.
By contrast, explicit evaluation contexts would be needed nearly everywhere, and would often be superfluous.
Another nice aspect of this decision is that it's not hard to write a translator that takes an ML program with effects and translates effects into a pure ML program written in monadic style. Again, since our effects are few, this requires the definition of only one monad (something like exceptions + continuations + state). This may prove useful if we need to translate the evaluator into a form more amenable to theorem proving. We're not there yet, though. :)
If you do decide to use meta-language continuations, I suspect that exceptions and threads really are enough. You should consider using (a very small subset of) MLton's thread interface:
I've already written an example implementation of yield in terms of Concurrent ML threads using SML/NJ. I guarantee sequentiality via a simple protocol through a single channel shared between the main thread and the generator thread.
Certainly a pre-emptive threading system would make the sequentiality manifest. Either way, it's a relatively small part of the spec -- again, the monster's in a box. :)
You're also quite right that using callcc for yield might be swatting flies with a bazooka. I'm experimenting with the two to see which I find clearer. Ah, the beauty of signatures and functors! It's such a pleasure working with ML.
Now, for a few comments on the choice of SML implementation.
I really appreciate all your input. In fact, I imagine we will be almost ML-implementation agnostic. Given your warnings, maybe that's naive -- I'm sure you have far more experience with this than I. But right now we need to move forward, so for the moment I think we're going to go with SML/NJ. However, the libraries are remarkably similar, and the vast majority of our code will just be using the simple, core features of SML. So porting to a different implementation at some point isn't out of the question.
Thanks,
We are looking for a middle ground between precision and readability. A direct-style interpreter has the benefit of being clearest and requiring no whole-interpreter modifications. Explicit continuations or evaluation contexts risk polluting a lot of the spec with mostly irrelevant detail.
Quite reasonable.
I've already written an example implementation of yield in terms of Concurrent ML threads using SML/NJ. I guarantee sequentiality via a simple protocol through a single channel shared between the main thread and the generator thread.
If your primary goal is a clean spec, throwing in Concurrent ML adds yet another level of semantic complexity. I appreciate the argument of keeping the complexity localized, but by using Concurrent ML you've now added a meta-language semantic complexity that pervades all of the semantics (the meta-language now has preemptive threads). I suspect all that is necessary for ES4 is non-preemptive threads (one example is MLton.Thread.{new,switch}), which have a much simpler semantics. My point is that it would be nice to keep the monster in the box in terms of both the code defining ES4 and the explanation of the meta language. CML might be too heavyweight.
To be more precise, and to speak in SML code, which is always more fun, I would think a structure matching the following signature is all that's needed.
signature THREAD = sig
type 'a t
val new: ('a -> unit) -> 'a t
val switch: ('a t -> 'b t * 'b) -> 'a
end
One can implement this signature using continuations, but it is also easily implementable using MLton.Thread. The point being that you can give a simple semantics to the meta-language extended with structure Thread: THREAD, and then implement structure Thread using whatever lower-level mechanism the particular SML implementation provides. And readers of the ES4 spec don't have to understand the low-level stuff.
Here, e.g., is how I might implement THREAD using a variant of SMLofNJ.Cont.
signature CONT = sig
type 'a t
val callcc: ('a t -> 'a) -> 'a
val throw: 'a t * 'a -> 'b
end
functor ContToThread (structure Cont: CONT val die: string -> 'a): THREAD = struct
structure Thread = struct
datatype 'a t =
Dead
| New of 'a -> unit
| Paused of 'a Cont.t
end
datatype z = datatype Thread.t
datatype 'a t = T of 'a Thread.t ref
val func: (unit -> unit) option ref = ref NONE
val start: unit Cont.t option ref = ref NONE
val () = Cont.callcc (fn k => start := SOME k)
val start = valOf (!start)
val () =
case !func of
NONE => ()
| SOME f => let
datatype t = Exit | Raise of exn
in
case (f (); Exit) handle e => Raise e of
Exit => die "thread exited"
| Raise _ => die "thread had unhandled exception"
end
fun new f = T (ref (New f))
fun switch f =
Cont.callcc
(fn k => let
val (T r, b) = f (T (ref (Paused k)))
in
case !r of
Dead => die "switch to dead thread"
| New f => (func := SOME (fn () => f b); Cont.throw (start, ()))
| Paused k => Cont.throw (k, b)
end)
end
I haven't used callcc in years, so there are likely errors. But my main point is that (I hope) you can improve your spec by containing the semantic extensions to the meta language.
Certainly a pre-emptive threading system would make the sequentiality manifest.
I'm not sure what your point is here.
In fact, I imagine we will be almost ML-implementation agnostic. Given your warnings, maybe that's naive -- I'm sure you have far more experience with this than I. But right now we need to move forward, so for the moment I think we're going to go with SML/NJ.
Going with SML/NJ is a fine choice. It's often what I use. I am simply saying that you should also compile with another compiler regularly, and from the start. It's not that hard. And if you don't, you will shoot yourself in the foot, and not find out until much later, after you have lost a lot of blood.
And if you do settle on MLton for your alternate implementation, you will get the added benefit of its type error messages, which many people find significantly better than SML/NJ's. And since type-checking is a big part of an SML programmers life :-), this can be a big win.
One can implement this signature using continuations, but it is also easily implementable using MLton.Thread. The point being that you can give a simple semantics to the meta-language extended with structure Thread: THREAD, and then implement structure Thread using whatever lower-level mechanism the particular SML implementation provides. And readers of the ES4 spec don't have to understand the low-level stuff.
It's an interesting idea, that we could pretend to be building on top of a native cooperative-threading semantics, which is actually implemented in terms of callcc. (I think it's what you were trying to say in your first email but I didn't pick up on it.) So casual readers have one level of abstraction they can use to think about it, and semanticists can see the lower level definition.
OTOH, I'm not sure the extra level of abstraction is necessary.
Certainly a pre-emptive threading system would make the sequentiality manifest.
I'm not sure what your point is here.
Just basically the same thing you said: using a concurrent meta-language complicates the underlying semantics.
Going with SML/NJ is a fine choice. It's often what I use. I am simply saying that you should also compile with another compiler regularly, and from the start. It's not that hard. And if you don't, you will shoot yourself in the foot, and not find out until much later, after you have lost a lot of blood.
Good advice, thanks. I'll give MLton a try.
Thanks,
we could pretend to be building on top of a native cooperative-threading semantics, which is actually implemented in terms of callcc.
Yes, that's a nice way of summarizing my point.
BTW is there a way to run MLton in a "check the syntax and type mode only". That would be a nice feature if you're just checking conformance with the SML spec.
Also, I would hope the SML community at large steps up to support this effort. If the preliminary spec is published via CVS/Subversion. I think it would be good for all the SML implementations to try to add that spec to their regressions. This would proivde the widest possible level of conformance, and a strong gurantee that the spec runs on every/many SML implemntation.
For that reason the idea of using Threads rather than call-cc might encourage more SML implementors to add the spec to their regressions.
As for the style of the semantics. In an ideal world there would be a high-level "natural" big-step semantics. This big-step semanitcs is best for your "casual" progammer to undersand the spec. A very detaled "small-steps" semantics would be useful and avoid the need for adding things like threads/call-cc.
Such a small-steps semantics maybe out of the charter of the current standard, but there is nothing preventing some energetic folks to maintain a small-step semantics in parallel with the "offical" spec. If the small-step semantics is more unambigous, one might consider that the "offical" spec and the big-step semantics the advisor commentary about it, or vice versa. But that decsion can be sorted out at the end of the day.
The other great thing about using any programming langauge as a spec. rather than a paper spec, is that we have great tools and models for collaborating via programming languages. So, I hope everyone views this as a great opporutinty to collaborate by investing resources and expertise when they can.
BTW is there a way to run MLton in a "check the syntax and type mode only". That would be a nice feature if you're just checking conformance with the SML spec.
Yes, "mlton -stop tc foo.sml" will stop the compiler after type checking is done. This is documented at
As for the style of the semantics. In an ideal world there would be a high-level "natural" big-step semantics. This big-step semanitcs is best for your "casual" progammer to undersand the spec. A very detaled "small-steps" semantics would be useful and avoid the need for adding things like threads/call-cc.
There's no reason a big-step operational semantics should be more readable than an ML evaluator. We were heading in the direction of a small-step semantics with explicit evaluation contexts, but decided that a direct-style interpreter would be a bit more abstract and more accessible.
Such a small-steps semantics maybe out of the charter of the current standard, but there is nothing preventing some energetic folks to maintain a small-step semantics in parallel with the "offical" spec. If
Yes, I've already prototyped a small-step semantics implemented in the term rewriting language Stratego. One of the interesting side effects of a small-step interpreter is that you get an automatic stepper (debugger), as opposed to just an evaluator.
the small-step semantics is more unambigous, one might consider that the "offical" spec and the big-step semantics the advisor commentary about it, or vice versa. But that decsion can be sorted out at the end of the day.
Well, no-- the working group has decided on ML as the spec language for very good reasons we've already gone into. We aren't going to develop two semantics in parallel on a tight deadline. :)
I was suggesting that those who were interested in different specification techniques do it on their own time, and leave you guys alone. :) When the next rev comes out people can flog it out or perhaps have a process to include more detailed specs as clarifying informational addenda... If I remember correctly RS5Rn is in that style. The prose is the "official" standard and the semantics was "informational".
It might be good to have such parallel efforts, as it might help to debug the spec you guys are developing. It's my experience that you can never believe anything is done right, unless you do it two times in different ways.
In particular proving type-safety is best done with a small-steps semantics. However, I don't think you guys want to tackle that in the spec. :) But it would be good to have something closely in sync with the spec so those in the research community interested in doing such a thing would have a good base to work from.
I was suggesting that those who were interested in different specification techniques do it on their own time, and leave you guys alone. :) When the next rev comes out people can flog it out or perhaps have a process to include more detailed specs as clarifying informational addenda... If I remember correctly RS5Rn is in that style. The prose is the "official" standard and the semantics was "informational".
I see.
It might be good to have such parallel efforts, as it might help to debug the spec you guys are developing. It's my experience that you can never believe anything is done right, unless you do it two times in different ways.
Sure.
In particular proving type-safety is best done with a small-steps semantics. However, I don't think you guys want to tackle that in the
Indeed-- however, it's hard to prove your type system sound if it isn't. ;) According to the current draft spec, ES4 has a dynamic type system with a "verification phase" that may reject some programs. However, there is no Milner-style guarantee if the static checker blesses your program; any expression may still generate a runtime type error.
Basic language safety (in the sense of Appel's "sermon on safety" in the Tiger book) is ensured via dynamic types. With a spec in ML, this amounts roughly to proving that every case analysis is exhaustive (conveniently proved by the ML compiler!), and perhaps strengthened by a hand proof showing that certain "can't-happen" cases never happen. Type soundness, though, doesn't hold.
(For the very latest on what type soundness means in a language with mixed static and dynamic types, I recommend Sam Tobin-Hochstadt and Matthias Felleisen's upcoming paper at DLS:
http://www.ccs.neu.edu/home/samth/
The basic refinement of Milner's statement is something along the lines of "programs with well-typed modules only go wrong due to untyped modules." However, we don't quite have such a system.)
I'm coming late to the thread. However, as a pragmatic decision. Using Standard ML is the much better choice, by far. This is no slur on Ocaml. I'd just think for this particular application it is a complete no brainier to use SML.Here are the reasons ordered by what I consider important for this specification:
There is actually a formal spec for it, that's been published. (call/cc is not part of that spec... but that's another issue.)
SML has actually be used to define itself www.ps.uni-sb.de/hamlet (A SML definitional interpreter for SML in SML)
The language is more well documented and understood then OCaml A book search for "Standard ML" on Amazon.com comes up with 632 hits. A book search for "OCaml" has 154 hits. If you compare the first page of hits, any reasonable person can conclude SML is a more well documented language. You'll see the Standard ML hits are all reference and standards docs while the OCaml hits quickly turn into books about Linux...
Support for call/cc in SML systems is much better any OCaml implementation of call/cc I've ever used. The cost of call/cc in SML/NJ is basically an O(1) operation too. MLton comes pretty close in practice too. The one or two times I've used OCaml, I can get it to reliably stack overflow and crash my program. .
SML compiler technology is far superior to OCaml in terms of the high-level optimizations that matter for an executable spec. If you want to use the spec as a test oracle:
www.smlnj.org (supports call/cc) (the grand daddy of all the SML compilers...) mlton.org (supports call/cc, whole program) Supper aggressive whole program optimizer. www.cl.cam.ac.uk/research/tsg/SMLNET, SML.NET translator with Visual Studio integration.
Two of the compilers above are whole-program optimizers, which means you'll probably get very good performance even for a highly-abstract interpreter. (OCaml doesn't have a whole-program optimizer) Throw in the partial-evaluators for SML www.diku.dk/topps/activities/sml-mix.html, www.informatik.uni-freiburg.de/proglang/research/software/mlope and you can get even better performance and a compiler for free. :)
Some background about me... (i.e. full disclosure as to why I'm an SML bigot). As a CMU undergraduate I started working on the Fox Project's TCP/IP networking stack (written in SML) (www.cs.cmu.edu/~fox/foxnet.html)
Afterwards I spent 5 years hard time in a Phd Program at Princeton working with Andrew Appel and generally hanging around with the SML/NJ crew... (www.cs.princeton.edu/~danwang) My thesis was built using the MLton backend.
After a detour into networks and proof hacking. I'm now an MS employee at MS. (Ignore the gmail..address..) (www.microsoft.com/Windows/CSE/pa/pa_home.mspx) (hacking in C++ and COM...ickk..)