C. Scott Ananian (2014-05-15T05:54:19.000Z)
domenic at domenicdenicola.com (2014-05-19T21:18:23.807Z)
On Wed, May 14, 2014 at 9:12 PM, Axel Rauschmayer <axel at rauschma.de> wrote: > It'd be great if there was material on the limits of the JVM and the CLR. > AFAICT these are the only virtual machines that are trying to be universal > (run both static and dynamic languages well). Well, from experience, the JVM is/was handicapped by some incidental decisions in its original standard library\[\*\] that have a large adverse impact on startup time. This has restricted the 'usefulness' of the JVM from its inception. There are projects to re-engineer the standard library around this, but they have been slow (and are not yet complete)\[\*\*\]. Similarly, the support for dynamic languages is fairly recent (JDK 7, JavaScript implementation using these features in JDK 8), so it's a bit early to know how that will play out in terms of adoption and practical use. So I'm not sure how much you're going to learn from the JVM, other than "no matter how good/bad your bytecode is, other factors may dominate". That is: I would doubt most conclusions about bytecodes drawn from the example of the JVM, since I don't believe the bytecode design was a first order effect on its trajectory to date. That said, my favorite bytecode anecdote from the JVM is that the amount of space wasted in class files by renaming the language from 'oak' to 'java' was far greater than the amount of space saved by adding a 'jsr' instruction to bytecode (which was intended to allow finally clauses without code duplicate). However, the jsr instruction was a disaster: it was responsible for the first security exploits in the JVM's early days, greatly complicated code verification (inspiring a bunch of new academic research! which is never something you want in a production language design), and slowed down execution by disallowing efficient bytecode verification techniques. It was ultimately deprecated in Java 6. So: if you want small bytecode files, sometimes it's better just to rename your language! --scott (a recovering Java compiler engineer) [*] My fuzzy recollection of one such: The `java.lang.System` class included the stdout/stdin/stderr fields `System.out`, `System.in`, `System.err` which as bytestreams needed to deal with the charset of the I/O streams (since Strings were natively UTF-16) and so ended up pulling in a huge list of supported charsets and charset conversion classes, totaling many hundreds of kilobytes of bytecode, none of which could be statically prebuilt because selecting the proper charset depended on the user's environment variable settings at runtime. The amount of ancillary code pulled in by the charset conversion machinery included `System.properties` (to read that environment variable), which was a `Map` subclass, so pulled in most of the Collections API, etc, etc. [**] See http://openjdk.java.net/projects/jigsaw/ and the blog entries linked there.