Various Loader-related e-mails

# Ian Hickson (11 years ago)

To avoid overly spamming the list I've coallesced my responses to various threads over the weekend into this one e-mail.

On Fri, 15 Aug 2014, John Barton wrote:

But since the only way the client can know that it needs a.js and jquery.js is if the server tells it [...]

There's at least four ways this can happen [...]

2: the server sends the browser all three files at once, preemptively. [...]

The second is too slow, because it implies sending resources that aren't needed, possibly using up bandwidth that could be used for other purposes. It's something that can be supported by packaging and other solutions already being developed, if people want it.

The second method is faster than your method. It results in the same three file transfer in one less round trip.

The second method does not imply sending any more or less resources than any other method listed.

The second is too slow because there's lots of other files involved in practice, for example the default style sheet, the scripts that are needed straight away, the Web components that are needed straight away, all the default images, etc. The whole point of the discussion in this thread is that we're talking about files that are not needed straight away, and how to obtain them promptly once they are needed.

Loading all of the files in a web page in the order they are needed is great goal and one that I think would make a great improvement in the Web.

It is certainly true that data for pages is "chunky": we need a mechanism for loading a group of related files at the right time.

Within each chunk we need all of the files if we need the root of the chunk. But that is exactly the ES6 case: if we need the root of the dependency tree, then we need all of the tree, that is the declarative design.

Having a design where the browser gets all the names, sends all the names back to the server, and gets the tree is just wasting a trip. Simply send the tree when the browser asks for the tree.

That's why in my opinion 'bundles' or 'packages' make sense: they combine the related dependencies and allow them to be loaded in one trip.

Divide this problem in to small pieces: ES6 bundles, HTML Imports, and some bundle/package loading solution. Don't use the same fine-grained solution for all layers.

This just doens't work.

Suppose the dependency graph looks like this:

 Feature A --> Dependency A1 \__\ Dependency    \
 Feature B --> Dependency B1 /  /    AB          >--> Dependency D
 Feature C --> Dependency C1 ---> Dependency C2 /

All of A, B, and C are to be fetched on-demand-only, to avoid using up too much bandwidth. All the files here are non-trivial in size.

How do you package this?

If you make a package for A, a package for B, and a package for C, then you'll have redundant content in the packages, and when the client asks for B after already having asked for A, the amount of content sent back will be greater than necessary and therefore it'll be slower than necessary. If you create multiple packages such that you group as much as possible into each package as possible without overlap, then you still end up with multiple resources to download when you need any of A, B, or C. Basically, it boils down to:

 Package A \__\ Package    \
 Package B /  /    AB       >--> Package D
 Package C ------------->  /

...and then you're back to the problem I asked about. If you don't have server-side support, then to avoid round-trips the client needs to know about the dependencies before it makes the first request. It can't wait til it receives the packages to discover the dependencies because if you do that then you're serialising your RTTs instead of pipelining them.

The fourth is what I'm looking at.

The fourth consists of the server having no built-in knowledge except what is in-band in the HTML file, namely, predeclared dependency trees.

By humans writing markup? That's not happening, at least not for more than trivial programs.

It turns out that on the Web, there are lots of "trivial" programs. When you have trillions of Web pages, even the smallest of fractions ends up being significant numbers of pages.

Such programs don't need the kind of features we are discussing.

There are applications that cover the entire spectrum from trivial one-file apps to gigantic monsters with thousands of packages containing each dozens of resources. Within this spectrum, you find applications that are satisfied by today's features, and you find applications that will need HTTP2 and be able to use all the fancy server-side support. But you also find, near the middle of the spectrum, applications that are complicated enough to need dependency management, and yet not high-profile enough to get server-side support.

No one is going to write hundreds of lines of dependency declaration already specified in their JS files.

Maybe not hundreds, but maybe dozens. Also, not all these dependencies are written in their JS files. Also, it turns out that people will do lots of things to make their pages quicker, and it's a lot easier, in many cases, to adjust their markup than to adjust their server configuration.

But the JS ones are written in JS files. And the non-JS ones are written somewhere. Inventing a mechanism that causes devs to duplicate all of this info is crazy.

The mechanism I'm working on here can be used for just the non-JS dependencies if you don't want to use it to do the JS ones in your projects. That's fine.

Just to be clear: I think dependency loading web pages is an awesome direction. Using a duplicate declarative solution? Not so much.

If we want to avoid duplicating the dependencies, we need to hoist them out of the scripts. Declaring the dependencies in the scripts is too late.

I'd be happy to make it possible for dependencies declared in HTML to cause linking to happen. In fact, I'm not really sure how to avoid it: there isn't really a way, as far as I can tell, for me to add a dependency to the ES6 module system without it affecting the linking.

And if you allow a tool running on the server (eg a build) then its the same as bundling with an extra unnecessary round trip.

The tool is likely to not be running on the server, but on the developent machine. It's unfortunate, and I don't really understand it, but it's a fact of life on the Web that many authors have only minimal control over their servers.

Many is now most. This what success looks like: browser page devs no longer work on servers. We have to design for this case.

I'm glad you agree with the goal. How do we get there?

You can specify dependencies in instantiate() returns.

That is not "before the load"; q.v. the subject line of this e-mail.

The list is known before the load and it is placed into the ES6 dependency system by returning an object from instantiate(); this is the way that bundling works as far as I understand it.

The "instantiate" hook is run after the load. The load happens in the "fetch" hook. Ideally, we'd have all the dependencies figure out by the "locate" hook; in some cases I really would like to be able to list the dependencies even before the "normalize" hook, e.g. in a previous load's "instantiate" hook.

Is there some way we can adjust the spec to do that?

On Fri, 15 Aug 2014, Kevin Smith wrote:

I think it would be reasonable for us to say that all the dependency declaration mechanisms are equivalent in that they all cause the target dependency to be "executed" (whatever that means in the context).

OK. The "instantiate" hook allows you to specify arbitrary dependencies, but only if the module in question is not coming from ES6 source code. There's currently no way to say: "instantiate this module from source as a normal ES6 module, but add A, B, and C to the dependency list." AFAIK, anyway.

Right. Is there some way we can adjust the spec to allow this?

This might need careful management around some resource types, though. In particular, CSS imports don't de-dupe, so we'd have to say what it means if you do:

<style depends-on="my-other-stylesheet"> @import url(my-other-stylesheet); </style>

Does this cause "my-other-stylesheet" to be loaded twice?

This is from your other post, right?

There are a number of issues around @import.

If you're going to use the ES module loader as a general dependency management framework, then each resource will need to have a corresponding ES module "image".

Can you elaborate on "image"?

What if the corresponding ES module for a stylesheet, instead of initializing and exposing a stylesheet object, exposed a factory function for creating stylesheet objects?

Could you elaborate on this? It seems reasonable, except that I really need the StyleSheet object to be created earlier than the "instantiate" hook since that's the likely place that the Fetch request initialisation options would be exposed for CSS.

On Fri, 15 Aug 2014, John Barton wrote:

On Fri, Aug 15, 2014 at 3:06 PM, Ian Hickson <ian at hixie.ch> wrote:

Suppose you have an HTML import foo.html that declares two modules:

<script type=module id=a> ... </script> <script type=module id=b> ... </script>

As we noted in another thread, Web devs no longer control servers. And servers no longer allow inline script (for the most part going forward). So I don't see this feature as worth investing effort in. (I don't like it either, but it is what it is).

While this might be true for top-level browsing contexts (though I doubt it), it's certainly not going to be true for HTML imports. One of the main goals of HTML imports is to make it possible to mash style sheets and ES6 modules together into a single resource using <style> and <script> blocks. (The security issues that led to CSP wanting to block internal scripts don't apply as much to modules, in theory.)

How should they refer to each other? For example, if module id=b wants to import module id=a? I suppose the logical way is like this:

import "#a";

import './a';

I don't understand. How would that work? Can you elaborate?

Now, in the main page, you reference the HTML import:

<link rel=import href="foo.html">

Now how would you refer to the modules? We can't have #b refer to it, since the scope of IDs is per-document, and the import has a separate document.

Separate document implies separate JS global: each needs its own Loader. So the rest of the questions aren't needed.

HTML imports definitely need to expose modules across documents. Are you saying this requires changes to ES6 to support? What changes would we need?

On Fri, 15 Aug 2014, John Barton wrote:

On Fri, Aug 15, 2014 at 3:41 PM, Ian Hickson <ian at hixie.ch> wrote:

On Fri, 15 Aug 2014, John Barton wrote:

The ES Loader does not maintain a dependency tree. It maintains a table of names->modules.

Maybe I'm misunderstanding the ES6 loader spec. What's the Load Record [[Dependencies]] list?

The dependencies for the Load. Once the load is complete the record is not needed.

How about if the dependencies are changed during the load? For example:

To avoid overly spamming the list I've coallesced my responses to various 
threads over the weekend into this one e-mail.

On Fri, 15 Aug 2014, John Barton wrote:
> > > > >
> > > > > But since the only way the client can know that it needs a.js 
> > > > > and jquery.js is if the server tells it [...]
> > > >
> > > > There's at least four ways this can happen [...]
> > > >
> > > > 2: the server sends the browser all three files at once, 
> > > >    preemptively.
> > > > [...]
> > > >
> > > > The second is too slow, because it implies sending resources that 
> > > > aren't needed, possibly using up bandwidth that could be used for 
> > > > other purposes. It's something that can be supported by packaging 
> > > > and other solutions already being developed, if people want it.
> > >
> > > The second method is faster than your method. It results in the same 
> > > three file transfer in one less round trip.
> > >
> > > The second method does not imply sending any more or less resources 
> > > than any other method listed.
> >
> > The second is too slow because there's lots of other files involved in 
> > practice, for example the default style sheet, the scripts that are 
> > needed straight away, the Web components that are needed straight 
> > away, all the default images, etc. The whole point of the discussion 
> > in this thread is that we're talking about files that are not needed 
> > straight away, and how to obtain them promptly once they are needed.
> 
> Loading all of the files in a web page in the order they are needed is 
> great goal and one that I think would make a great improvement in the 
> Web.
> 
> It is certainly true that data for pages is "chunky": we need a 
> mechanism for loading a group of related files at the right time.
> 
> Within each chunk we need all of the files if we need the root of the 
> chunk. But that is exactly the ES6 case: if we need the root of the 
> dependency tree, then we need all of the tree, that is the declarative 
> design.
> 
> Having a design where the browser gets all the names, sends all the 
> names back to the server, and gets the tree is just wasting a trip. 
> Simply send the tree when the browser asks for the tree.
> 
> That's why in my opinion 'bundles' or 'packages' make sense: they 
> combine the related dependencies and allow them to be loaded in one 
> trip.
> 
> Divide this problem in to small pieces: ES6 bundles, HTML Imports, and 
> some bundle/package loading solution.  Don't use the same fine-grained 
> solution for all layers.

This just doens't work.

Suppose the dependency graph looks like this:

     Feature A --> Dependency A1 \__\ Dependency    \
     Feature B --> Dependency B1 /  /    AB          >--> Dependency D
     Feature C --> Dependency C1 ---> Dependency C2 /

All of A, B, and C are to be fetched on-demand-only, to avoid using up 
too much bandwidth. All the files here are non-trivial in size.

How do you package this?

If you make a package for A, a package for B, and a package for C, then 
you'll have redundant content in the packages, and when the client asks 
for B after already having asked for A, the amount of content sent back 
will be greater than necessary and therefore it'll be slower than 
necessary. If you create multiple packages such that you group as much as 
possible into each package as possible without overlap, then you still end 
up with multiple resources to download when you need any of A, B, or C. 
Basically, it boils down to:

     Package A \__\ Package    \
     Package B /  /    AB       >--> Package D
     Package C ------------->  /

...and then you're back to the problem I asked about. If you don't have 
server-side support, then to avoid round-trips the client needs to know 
about the dependencies before it makes the first request. It can't wait 
til it receives the packages to discover the dependencies because if you 
do that then you're serialising your RTTs instead of pipelining them.

> > > > The fourth is what I'm looking at.
> > > >
> > > > The fourth consists of the server having no built-in knowledge 
> > > > except what is in-band in the HTML file, namely, predeclared 
> > > > dependency trees.
> > >
> > > By humans writing markup? That's not happening, at least not for 
> > > more than trivial programs.
> >
> > It turns out that on the Web, there are lots of "trivial" programs. 
> > When you have trillions of Web pages, even the smallest of fractions 
> > ends up being significant numbers of pages.
> 
> Such programs don't need the kind of features we are discussing.

There are applications that cover the entire spectrum from trivial 
one-file apps to gigantic monsters with thousands of packages containing 
each dozens of resources. Within this spectrum, you find applications that 
are satisfied by today's features, and you find applications that will 
need HTTP2 and be able to use all the fancy server-side support. But you 
also find, near the middle of the spectrum, applications that are 
complicated enough to need dependency management, and yet not high-profile 
enough to get server-side support.

> > > No one is going to write hundreds of lines of dependency declaration 
> > > already specified in their JS files.
> >
> > Maybe not hundreds, but maybe dozens. Also, not all these dependencies 
> > are written in their JS files. Also, it turns out that people will do 
> > lots of things to make their pages quicker, and it's a lot easier, in 
> > many cases, to adjust their markup than to adjust their server 
> > configuration.
> 
> But the JS ones are written in JS files. And the non-JS ones are written 
> somewhere. Inventing a mechanism that causes devs to duplicate all of 
> this info is crazy.

The mechanism I'm working on here can be used for just the non-JS 
dependencies if you don't want to use it to do the JS ones in your 
projects. That's fine.

> Just to be clear:  I think dependency loading web pages is an awesome 
> direction.  Using a duplicate declarative solution? Not so much.

If we want to avoid duplicating the dependencies, we need to hoist them 
out of the scripts. Declaring the dependencies in the scripts is too late.

I'd be happy to make it possible for dependencies declared in HTML to 
cause linking to happen. In fact, I'm not really sure how to avoid it: 
there isn't really a way, as far as I can tell, for me to add a dependency 
to the ES6 module system without it affecting the linking.

> > > And if you allow a tool running on the server (eg a build) then its 
> > > the same as bundling with an extra unnecessary round trip.
> >
> > The tool is likely to not be running on the server, but on the 
> > developent machine. It's unfortunate, and I don't really understand 
> > it, but it's a fact of life on the Web that many authors have only 
> > minimal control over their servers.
> 
> Many is now most. This what success looks like: browser page devs no 
> longer work on servers. We have to design for this case.

I'm glad you agree with the goal. How do we get there?

> > > You can specify dependencies in `instantiate()` returns.
> >
> > That is not "before the load"; q.v. the subject line of this e-mail.
> 
> The list is known before the load and it is placed into the ES6 
> dependency system by returning an object from `instantiate()`; this is 
> the way that bundling works as far as I understand it.

The "instantiate" hook is run after the load. The load happens in the 
"fetch" hook. Ideally, we'd have all the dependencies figure out by the 
"locate" hook; in some cases I really would like to be able to list the 
dependencies even before the "normalize" hook, e.g. in a _previous_ load's 
"instantiate" hook.

Is there some way we can adjust the spec to do that?

On Fri, 15 Aug 2014, Kevin Smith wrote:
> >
> > I think it would be reasonable for us to say that all the dependency 
> > declaration mechanisms are equivalent in that they all cause the 
> > target dependency to be "executed" (whatever that means in the 
> > context).
> 
> OK.  The "instantiate" hook allows you to specify arbitrary 
> dependencies, but only if the module in question is not coming from ES6 
> source code. There's currently no way to say:  "instantiate this module 
> from source as a normal ES6 module, but add A, B, and C to the 
> dependency list."  AFAIK, anyway.

Right. Is there some way we can adjust the spec to allow this?

> > This might need careful management around some resource types, though. 
> > In particular, CSS imports don't de-dupe, so we'd have to say what it 
> > means if you do:
> >
> >    <style depends-on="my-other-stylesheet">
> >      @import url(my-other-stylesheet);
> >    </style>
> >
> > Does this cause "my-other-stylesheet" to be loaded twice?
> 
> This is from your other post, right?

There are a number of issues around @import.

> If you're going to use the ES module loader as a general dependency 
> management framework, then each resource will need to have a 
> corresponding ES module "image".

Can you elaborate on "image"?

> What if the corresponding ES module for a stylesheet, instead of 
> initializing and exposing a stylesheet object, exposed a factory 
> function for creating stylesheet objects?

Could you elaborate on this? It seems reasonable, except that I really 
need the StyleSheet object to be created earlier than the "instantiate" 
hook since that's the likely place that the Fetch request initialisation 
options would be exposed for CSS.

On Fri, 15 Aug 2014, John Barton wrote:
> On Fri, Aug 15, 2014 at 3:06 PM, Ian Hickson <ian at hixie.ch> wrote:
> >
> > Suppose you have an HTML import foo.html that declares two modules:
> >
> >    <script type=module id=a> ... </script>
> >    <script type=module id=b> ... </script>
> 
> As we noted in another thread, Web devs no longer control servers. And 
> servers no longer allow inline script (for the most part going forward). 
> So I don't see this feature as worth investing effort in. (I don't like 
> it either, but it is what it is).

While this might be true for top-level browsing contexts (though I doubt 
it), it's certainly not going to be true for HTML imports. One of the main 
goals of HTML imports is to make it possible to mash style sheets and ES6 
modules together into a single resource using <style> and <script> blocks. 
(The security issues that led to CSP wanting to block internal scripts 
don't apply as much to modules, in theory.)

> > How should they refer to each other? For example, if module id=b wants 
> > to import module id=a? I suppose the logical way is like this:
> >
> >    import "#a";
> 
> import './a';

I don't understand. How would that work? Can you elaborate?

> > Now, in the main page, you reference the HTML import:
> >
> >    <link rel=import href="foo.html">
> >
> > Now how would you refer to the modules? We can't have #b refer to it, 
> > since the scope of IDs is per-document, and the import has a separate 
> > document.
> 
> Separate document implies separate JS global: each needs its own Loader. 
> So the rest of the questions aren't needed.

HTML imports definitely need to expose modules across documents. Are you 
saying this requires changes to ES6 to support? What changes would we need?

On Fri, 15 Aug 2014, John Barton wrote:
> On Fri, Aug 15, 2014 at 3:41 PM, Ian Hickson <ian at hixie.ch> wrote:
> > On Fri, 15 Aug 2014, John Barton wrote:
> > >
> > > The ES Loader does not maintain a dependency tree. It maintains a 
> > > table of names->modules.
> >
> > Maybe I'm misunderstanding the ES6 loader spec. What's the Load Record 
> > [[Dependencies]] list?
> 
> The dependencies for the Load. Once the load is complete the record is 
> not needed.

How about if the dependencies are changed during the load? For example:

   <script id=a src="a.js" load-policy=when-needed></script>
   <script id=b src="b.js" load-policy=when-needed></script>
   <script id=c needs="a"> ... </script>
   <script>
    // at this point, the script with id=c is blocked waiting for a.js to 
    // load. Let's change its dependencies:
    document.scripts.c.needs = 'b';
    // now the script needs to trigger b.js to load
    // a.js' load can be deprioritised (or canceled, if network is at a 
    // premium), and no longer blocks the script from loading
   </script>

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'