Bytecode
That's a good one. It reflects lots of discussions Alon and others, including yours truly, have had over the years.
The important point is not just "bytecode isn't everything" -- also important are trade-offs in have two syntaxes, something that came up here on es-discuss long ago. Here's a post from Maciej of Apple:
esdiscuss/2009-December/010238
The topic then was AST encoding, not something people think of as "bytecode". Michael Franz at UCI and Ben Livshits at MSR had worked on arithmetic coding of ASTs (avoids need for verification).
The topic is deep and memes die hard. I think Alon's post is the right thought-piece; anyone have others?
It'd be great if there was material on the limits of the JVM and the CLR. AFAICT these are the only virtual machines that are trying to be universal (run both static and dynamic languages well).
On Wed, May 14, 2014 at 9:12 PM, Axel Rauschmayer <axel at rauschma.de> wrote:
It'd be great if there was material on the limits of the JVM and the CLR. AFAICT these are the only virtual machines that are trying to be universal (run both static and dynamic languages well).
Well, from experience, the JVM is/was handicapped by some incidental decisions in its original standard library[*] that have a large adverse impact on startup time. This has restricted the 'usefulness' of the JVM from its inception. There are projects to re-engineer the standard library around this, but they have been slow (and are not yet complete)[**]. Similarly, the support for dynamic languages is fairly recent (JDK 7, JavaScript implementation using these features in JDK 8), so it's a bit early to know how that will play out in terms of adoption and practical use.
So I'm not sure how much you're going to learn from the JVM, other than "no matter how good/bad your bytecode is, other factors may dominate". That is: I would doubt most conclusions about bytecodes drawn from the example of the JVM, since I don't believe the bytecode design was a first order effect on its trajectory to date.
That said, my favorite bytecode anecdote from the JVM is that the amount of space wasted in class files by renaming the language from 'oak' to 'java' was far greater than the amount of space saved by adding a 'jsr' instruction to bytecode (which was intended to allow finally clauses without code duplicate). However, the jsr instruction was a disaster: it was responsible for the first security exploits in the JVM's early days, greatly complicated code verification (inspiring a bunch of new academic research! which is never something you want in a production language design), and slowed down execution by disallowing efficient bytecode verification techniques. It was ultimately deprecated in Java 6.
So: if you want small bytecode files, sometimes it's better just to rename your language! --scott (a recovering Java compiler engineer)
[*] My fuzzy recollection of one such: The java.lang.System
class
included the stdout/stdin/stderr fields System.out
, System.in
,
System.err
which as bytestreams needed to deal with the charset of the
I/O streams (since Strings were natively UTF-16) and so ended up pulling in
a huge list of supported charsets and charset conversion classes, totaling
many hundreds of kilobytes of bytecode, none of which could be statically
prebuilt because selecting the proper charset depended on the user's
environment variable settings at runtime. The amount of ancillary code
pulled in by the charset conversion machinery included System.properties
(to read that environment variable), which was a Map
subclass, so pulled
in most of the Collections API, etc, etc.
[**] See openjdk.java.net/projects/jigsaw and the blog entries linked there.
It's my understanding that the vast majority of the CLR's dynamic language support is at the runtime level, not the bytecode level. The bytecode is strongly typed (with lots of instructions/mechanisms for boxing, unboxing and type casts), and dynamic support is done through something called the 'DLR' that sits on top of the CLR. The DLR provides machinery for things like late binding and inline caches.
For this C# snippet:
using System;
public static class Program {
public static void Main (string[] args) {
dynamic one = (Func<int>)(
() => 1
);
dynamic doubleInt = (Func<int, int>)(
(int x) => x * 2
);
Console.WriteLine("{0} {1}", one(), doubleInt(1));
}
}
The desugared (well, decompiled from IL - the arg_XXX_X variables are from the decompiler, not actually in the IL) C# looks like this:
public static void Main(string[] args)
{
object one = () => 1;
object doubleInt = (int x) => x * 2;
if (Program.<Main>o__SiteContainer0.<>p__Site1 == null)
{
Program.<Main>o__SiteContainer0.<>p__Site1 =
CallSite<Action<CallSite, Type, string, object,
object>>.Create(Binder.InvokeMember(CSharpBinderFlags.ResultDiscarded,
"WriteLine", null, typeof(Program), new CSharpArgumentInfo[]
{
CSharpArgumentInfo.Create(CSharpArgumentInfoFlags.UseCompileTimeType
| CSharpArgumentInfoFlags.IsStaticType, null),
CSharpArgumentInfo.Create(CSharpArgumentInfoFlags.UseCompileTimeType
| CSharpArgumentInfoFlags.Constant, null),
CSharpArgumentInfo.Create(CSharpArgumentInfoFlags.None, null),
CSharpArgumentInfo.Create(CSharpArgumentInfoFlags.None, null)
}));
}
Action<CallSite, Type, string, object, object> arg_15C_0 =
Program.<Main>o__SiteContainer0.<>p__Site1.Target;
CallSite arg_15C_1 = Program.<Main>o__SiteContainer0.<>p__Site1;
Type arg_15C_2 = typeof(Console);
string arg_15C_3 = "{0} {1}";
if (Program.<Main>o__SiteContainer0.<>p__Site2 == null)
{
Program.<Main>o__SiteContainer0.<>p__Site2 =
CallSite<Func<CallSite, object,
object>>.Create(Binder.Invoke(CSharpBinderFlags.None, typeof(Program),
new CSharpArgumentInfo[]
{
CSharpArgumentInfo.Create(CSharpArgumentInfoFlags.None, null)
}));
}
object arg_15C_4 =
Program.<Main>o__SiteContainer0.<>p__Site2.Target(Program.<Main>o__SiteContainer0.<>p__Site2,
one);
if (Program.<Main>o__SiteContainer0.<>p__Site3 == null)
{
Program.<Main>o__SiteContainer0.<>p__Site3 =
CallSite<Func<CallSite, object, int,
object>>.Create(Binder.Invoke(CSharpBinderFlags.None, typeof(Program),
new CSharpArgumentInfo[]
{
CSharpArgumentInfo.Create(CSharpArgumentInfoFlags.None, null),
CSharpArgumentInfo.Create(CSharpArgumentInfoFlags.UseCompileTimeType
| CSharpArgumentInfoFlags.Constant, null)
}));
}
arg_15C_0(arg_15C_1, arg_15C_2, arg_15C_3, arg_15C_4,
Program.<Main>o__SiteContainer0.<>p__Site3.Target(Program.<Main>o__SiteContainer0.<>p__Site3,
doubleInt, 1));
}
So you can see all the inline cache and binding machinery at work there. As far as I know there were 0 bytecode changes to introduce this feature; I certainly didn't have to implement any special bytecodes to support 'dynamic' in JSIL.
There are certainly some aspects of the CLR bytecode that make dynamic languages easier/harder to build on top of it, though. I just don't know what they are. I know a lot of the pain was reduced with the addition of 'lightweight code generation' or LCG, which allows jitting a single method on the fly and attaching it to a given context (like a method) so that it can access private members. This is used heavily in dynamic languages on the CLR now.
maybe relevant to this topic: www.webkit.org/blog/3362/introducing-the-webkit-ftl-jit
WebKit chaps basically transform JS to LLVM compatible instructions
Too bad they are testing from 2007 libraries such Prototype and inheritance.js :P
On May 15, 2014, at 9:05 PM, Andrea Giammarchi <andrea.giammarchi at gmail.com> wrote:
maybe relevant to this topic: www.webkit.org/blog/3362/introducing-the-webkit-ftl-jit
WebKit chaps basically transform JS to LLVM compatible instructions
Thanks for the shout-out!
But I don't think it's quite relevant, since we transform to LLVM IR only after dynamic type inference has already happened - so the first N (for large N) executions of any code do not involve LLVM IR. Typical bytecode systems will use the bytecode as the basic underlying truth.
Too bad they are testing from 2007 libraries such Prototype and inheritance.js :P
We test many things.
but you mentioned very old one I think nobody cares much anymore ;-)
still very interesting how you "reach" that LLVM IR !
On May 15, 2014, at 10:24 PM, Andrea Giammarchi <andrea.giammarchi at gmail.com> wrote:
but you mentioned very old one I think nobody cares much anymore ;-)
People may not “care” about them today, but that doesn’t mean that no one uses them - there are many millions of webpages that use these libraries still, and that means performance of those libraries is extremely important to end users.
Regarding the original topic of this thread: I think there have been many many prior discussions of a standardised bytecode on es-discuss, and people should really be reading those before bringing this up again. It’s not going to happen as no one has ever demonstrated an actual benefit over simply using JS.
—Oliver
Regarding the original topic of this thread: I think there have been many many prior discussions of a standardised bytecode on es-discuss, and people should really be reading those before bringing this up again. It’s not going to happen as no one has ever demonstrated an actual benefit over simply using JS.
I don't think anybody on this thread was trying to debate it. Axel was just hoping to find some definitive explanations for those bringing up the topic. It comes up a lot from various people on the internet, so its nice to have something to point to for those people.
I think this is the definitive post: mozakai.blogspot.com/2013/05/the-elusive-universal-web-bytecode.html
Sure Oliver, it's just funny to read very old unmaintained libraries as the base code to test LLVM IR on top ... what you say makes sense but then I'd expect some new library maybe based on some new ES5 feature too, that's all I was trying to say and there was nothing about criticizing the awesome job WebKit did there.
Take Care
I kind of feel that even if such a bytecode existed, it should be immaterial to the design of ES. What I'm trying to say is that probably a better place for this discussion is at the web standards level. This decision can be completely outside of the design of any individual language, provided a generic enough bytecode. Now obviously being able to have web apps written in different languages while maintaining performance and debuggability would be nice. But for me, the main benefit of the bytecode is having my engineering team be able to adopt newer versions of the language at our convenience (instead of waiting 10 years until some ancient client updates their script engine)...
From: Andrea Giammarchi <andrea.giammarchi at gmail.com<mailto:andrea.giammarchi at gmail.com>>
Date: Friday, May 16, 2014 at 3:25 PM To: Oliver Hunt <oliver at apple.com<mailto:oliver at apple.com>>
Cc: es-discuss list <es-discuss at mozilla.org<mailto:es-discuss at mozilla.org>>
Subject: Re: Bytecode
Sure Oliver, it's just funny to read very old unmaintained libraries as the base code to test LLVM IR on top ... what you say makes sense but then I'd expect some new library maybe based on some new ES5 feature too, that's all I was trying to say and there was nothing about criticizing the awesome job WebKit did there.
Take Care
On Fri, May 16, 2014 at 9:43 AM, Oliver Hunt <oliver at apple.com<mailto:oliver at apple.com>> wrote:
On May 15, 2014, at 10:24 PM, Andrea Giammarchi <andrea.giammarchi at gmail.com<mailto:andrea.giammarchi at gmail.com>> wrote:
but you mentioned very old one I think nobody cares much anymore ;-)
People may not "care" about them today, but that doesn't mean that no one uses them - there are many millions of webpages that use these libraries still, and that means performance of those libraries is extremely important to end users.
Regarding the original topic of this thread: I think there have been many many prior discussions of a standardised bytecode on es-discuss, and people should really be reading those before bringing this up again. It's not going to happen as no one has ever demonstrated an actual benefit over simply using JS.
-Oliver
This e-mail is intended only for the use of the addressees. Any copying, forwarding, printing or other use of this e-mail by persons other than the addressees is not authorized. This e-mail may contain information that is privileged, confidential and exempt from disclosure. If you are not the intended recipient, please notify us immediately by return e-mail (including the original message in your reply) and then delete and discard all copies of the e-mail.
Thank you.
On Fri, May 16, 2014 at 12:49 PM, Mameri, Fred (HBO) <Fred.Mameri at hbo.com>wrote:
maintaining performance and debuggability would be nice. But for me, the main benefit of the bytecode is having my engineering team be able to adopt newer versions of the language at our convenience (instead of waiting 10 years until some ancient client updates their script engine)…
You may want to look into the following: coffeescript, traceur, dart, es6-shim (etc).
Also TypeScript
On Fri, May 16, 2014 at 10:18 PM, C. Scott Ananian <ecmascript at cscott.net>wrote:
On Fri, May 16, 2014 at 12:49 PM, Mameri, Fred (HBO) <Fred.Mameri at hbo.com>wrote:
maintaining performance and debuggability would be nice. But for me, the main benefit of the bytecode is having my engineering team be able to adopt newer versions of the language at our convenience (instead of waiting 10 years until some ancient client updates their script engine)…
You may want to look into the following: coffeescript, traceur, dart, es6-shim (etc). --scott
Far as I see it, the discussion isn't really about bytecode. It's about that you can't quickly/easily tack onto JS everything that's required to make it a good virtual machine you can target from another language. asm.js is certainly trying, but it's also so far unsupported everywhere but Firefox. asm.js does have this problem that it it can't express available native types (byte, short, float, long etc.) because it's running in JS, which only knows doubles, or ints (appending bit or). And that ain't gonna change, because if asm.js starts to rely on functionality (such as type annotations for asm.js) that other JS engines don't have, the asm.js code won't run anywhere else anymore.
So the discussion really is about a Web-VM that's got all the trimmings of being a good compile target. What intermediary format you deliver to it is quite a secondary question.
On Mon, May 19, 2014 at 3:32 PM, Florian Bösch <pyalot at gmail.com> wrote
Far as I see it, the discussion isn't really about bytecode. It's about that you can't quickly/easily tack onto JS everything that's required to make it a good virtual machine you can target from another language. asm.js is certainly trying, but it's also so far unsupported everywhere but Firefox. asm.js does have this problem that it it can't express available native types (byte, short, float, long etc.) because it's running in JS, which only knows doubles, or ints (appending bit or). And that ain't gonna change, because if asm.js starts to rely on functionality (such as type annotations for asm.js) that other JS engines don't have, the asm.js code won't run anywhere else anymore.
So the discussion really is about a Web-VM that's got all the trimmings of being a good compile target. What intermediary format you deliver to it is quite a secondary question.
This discussion is about nothing of the sort: it's purely about where one can find good arguments against the needs for a bytecode for the web. Please please please keep it that way: the discussion you and Fred want to engage in has been had too many times and really isn't a good topic for this mailing list in any case.
Well, if you're simply going to come up with a bytecode to match JS, then you're gonna have the same kinds of issues that typescript, asm.js, dart, etc. have to target it as a compile target. So if you want to make a VM that's a good compile target, ye're gonna have to eventually discuss what that actually means.
On Mon, May 19, 2014 at 3:46 PM, Florian Bösch <pyalot at gmail.com> wrote:
Well, if you're simply going to come up with a bytecode to match JS, then you're gonna have the same kinds of issues that typescript, asm.js, dart, etc. have to target it as a compile target. So if you want to make a VM that's a good compile target, ye're gonna have to eventually discuss what that actually means.
Yes. But then this list would still not be the right venue, as that bytecode wouldn't be EcmaScript. So this is off-topic regardless of what you think of the merits of such a discussion.
So just so I get this straight. You're talking about a bytecode format, which implies some kind of revamped features/VM to run it, but you won't be discussing anything other than ECMAScript as the targeting semantic. Sorry to say, but then that's a pretty useless discussion entirely.
On Mon, May 19, 2014 at 3:55 PM, Florian Bösch <pyalot at gmail.com> wrote:
So just so I get this straight. You're talking about a bytecode format, which implies some kind of revamped features/VM to run it, but you won't be discussing anything other than ECMAScript as the targeting semantic. Sorry to say, but then that's a pretty useless discussion entirely.
No, I don't want to talk about a bytecode format at all. At least not on this list, as this list is about ECMAScript, and nothing else. If you want to make the case for a bytecode format for the web, take it to some other forum.
Well, it is a thread on bytecode, that had a discussion on bytecode, but sure, whatever.
I take a more expansive view, because of evolution. JS and languages that currently target it, and also languages that might in the future target it, are co-evolving. They influence one another.
JS is growing SIMD and other lower-level APIs (perhaps even ARB_compute_shader in a future WebGL iteration). Value objects for more numeric types are coming.
Also, the Harmony era has JS as better target for compilers as an explicit goal.
So it seems to me worthwhile to talk about certain "multi-language VM" design issues. Bytecode in general, perhaps a standard, fast, zero-verification AST codec format, seems fair game for es-discuss.
But I agree that putting the bytecode syntax cart ahead of the horses (language designs and their semantic requirements) is a mistake. As McCarthy suggested, there may be several concrete syntaxes. What's the abstract syntax, and ahead of that, what does it mean?
Le 14/05/2014 19:13, Axel Rauschmayer a écrit :
What is the best “bytecode isn’t everything” article that exists? The “the web needs bytecode” meme comes up incredibly often, I’d like to have something good to point to, as an answer.
This one looks good: mozakai.blogspot.de/2013/05/the-elusive-universal-web-bytecode.html
I want to suggest www.destroyallsoftware.com/talks/the-birth-and-death-of-javascript I know it's not a direct answer to your question and I know the talk is not 100% serious, but it builds on a trend about JavaScript that suggests that JavaScript can be good enough as it is and a bytecode isn't needed. This talk also contains bit and pieces of knowledge helping to understand this trend.
After writing an ES6->ES5 compiler, I've come to the conclusion that ES5 is an intermediary language. For dynamic, duck-typed languages it's not so bad.
I always found the Dart people's arguments the most persuasive:
www.dartlang.org/articles/why-not-bytecode
Basically, any language bytecode will inevitably end up targeted at the language it was written for. From that point of view, a web-standard bytecode VM simply isn't feasible.
Myself, I think the only bytecode suitable for the web is raw machine code. One you start abstracting away from the machine level, you inevitably end up designing your bytecode for your specific language use case.
Of course, that would be tantamount to standardizing one CPU architecture for web applications, which is a terrible idea. Much better to stick with highly optimized scripting languages like Javascript and have everyone else stick to browser plugin SDKs and NativeClient.
P.S.: There is a flaw in the Dart community's arguments. It is faster to compile to to and execute on a bytecode VM, even if it's a purely internal one. Or at least, it's faster to use jump tables for your virtual machine state code instead of function pointers. That's why simply executing the AST tree (which is, after all, much easier to code than coming up with an instruction set) is so rarely done (which is how Dart does it).
What is the best “bytecode isn’t everything” article that exists? The “the web needs bytecode” meme comes up incredibly often, I’d like to have something good to point to, as an answer.
This one looks good: mozakai.blogspot.de/2013/05/the-elusive-universal-web-bytecode.html