ES Discuss - Message History

C. Scott Ananian (2014-05-15T05:54:19.000Z)

Go to Source

On Wed, May 14, 2014 at 9:12 PM, Axel Rauschmayer <axel at rauschma.de> wrote:

> It'd be great if there was material on the limits of the JVM and the CLR.
> AFAICT these are the only virtual machines that are trying to be universal
> (run both static and dynamic languages well).
>

Well, from experience, the JVM is/was handicapped by some incidental
decisions in its original standard library[*] that have a large adverse
impact on startup time. This has restricted the 'usefulness' of the JVM
from its inception. There are projects to re-engineer the standard library
around this, but they have been slow (and are not yet complete)[**].
Similarly, the support for dynamic languages is fairly recent (JDK 7,
JavaScript implementation using these features in JDK 8), so it's a bit
early to know how that will play out in terms of adoption and practical use.

So I'm not sure how much you're going to learn from the JVM, other than "no
matter how good/bad your bytecode is, other factors may dominate". That
is: I would doubt most conclusions about bytecodes drawn from the example
of the JVM, since I don't believe the bytecode design was a first order
effect on its trajectory to date.

That said, my favorite bytecode anecdote from the JVM is that the amount of
space wasted in class files by renaming the language from 'oak' to 'java'
was far greater than the amount of space saved by adding a 'jsr'
instruction to bytecode (which was intended to allow finally clauses
without code duplicate). However, the jsr instruction was a disaster: it
was responsible for the first security exploits in the JVM's early days,
greatly complicated code verification (inspiring a bunch of new academic
research! which is never something you want in a production language
design), and slowed down execution by disallowing efficient bytecode
verification techniques. It was ultimately deprecated in Java 6.

So: if you want small bytecode files, sometimes it's better just to rename
your language!
--scott (a recovering Java compiler engineer)

[*] My fuzzy recollection of one such: The `java.lang.System` class
included the stdout/stdin/stderr fields `System.out`, `System.in`,
`System.err` which as bytestreams needed to deal with the charset of the
I/O streams (since Strings were natively UTF-16) and so ended up pulling in
a huge list of supported charsets and charset conversion classes, totaling
many hundreds of kilobytes of bytecode, none of which could be statically
prebuilt because selecting the proper charset depended on the user's
environment variable settings at runtime. The amount of ancillary code
pulled in by the charset conversion machinery included `System.properties`
(to read that environment variable), which was a `Map` subclass, so pulled
in most of the Collections API, etc, etc.

[**] See http://openjdk.java.net/projects/jigsaw/ and the blog entries
linked there.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20140514/6e32799f/attachment.html>

domenic at domenicdenicola.com (2014-05-19T21:18:23.807Z)

On Wed, May 14, 2014 at 9:12 PM, Axel Rauschmayer <axel at rauschma.de> wrote:

> It'd be great if there was material on the limits of the JVM and the CLR.
> AFAICT these are the only virtual machines that are trying to be universal
> (run both static and dynamic languages well).

Well, from experience, the JVM is/was handicapped by some incidental
decisions in its original standard library\[\*\] that have a large adverse
impact on startup time.  This has restricted the 'usefulness' of the JVM
from its inception.  There are projects to re-engineer the standard library
around this, but they have been slow (and are not yet complete)\[\*\*\].
 Similarly, the support for dynamic languages is fairly recent (JDK 7,
JavaScript implementation using these features in JDK 8), so it's a bit
early to know how that will play out in terms of adoption and practical use.

So I'm not sure how much you're going to learn from the JVM, other than "no
matter how good/bad your bytecode is, other factors may dominate".  That
is: I would doubt most conclusions about bytecodes drawn from the example
of the JVM, since I don't believe the bytecode design was a first order
effect on its trajectory to date.

That said, my favorite bytecode anecdote from the JVM is that the amount of
space wasted in class files by renaming the language from 'oak' to 'java'
was far greater than the amount of space saved by adding a 'jsr'
instruction to bytecode (which was intended to allow finally clauses
without code duplicate).  However, the jsr instruction was a disaster: it
was responsible for the first security exploits in the JVM's early days,
greatly complicated code verification (inspiring a bunch of new academic
research! which is never something you want in a production language
design), and slowed down execution by disallowing efficient bytecode
verification techniques.  It was ultimately deprecated in Java 6.

So: if you want small bytecode files, sometimes it's better just to rename
your language!
  --scott (a recovering Java compiler engineer)

[*] My fuzzy recollection of one such: The `java.lang.System` class
included the stdout/stdin/stderr fields `System.out`, `System.in`,
`System.err` which as bytestreams needed to deal with the charset of the
I/O streams (since Strings were natively UTF-16) and so ended up pulling in
a huge list of supported charsets and charset conversion classes, totaling
many hundreds of kilobytes of bytecode, none of which could be statically
prebuilt because selecting the proper charset depended on the user's
environment variable settings at runtime.  The amount of ancillary code
pulled in by the charset conversion machinery included `System.properties`
(to read that environment variable), which was a `Map` subclass, so pulled
in most of the Collections API, etc, etc.

[**] See http://openjdk.java.net/projects/jigsaw/ and the blog entries
linked there.

Edit