Towards a JavaScript Binary AST

Yoric | 310 points

To clarify how this is not related to WebAssembly, this is for code written in JavaScript, while WASM is for code written in other languages.

It's a fairly simple optimization - it's still JavaScript, just compressed and somewhat pre-parsed.

WASM doesn't currently have built-in garbage collection, so to use it to compress/speed up/whatever JavaScript, you would have to compile an entire JavaScript Virtual Machine into WASM, which is almost certainly going to be slower than just running regular JavaScript in the browser's built-in JS engine.

(This is true for the time being, anyway. WASM should eventually support GC at which point it might make sense to compile JS to WASM in some cases.)

nfriedly | 7 years ago

So, compiled Javascript then? "We meet again, at last. The circle is now complete."

The more I see interpreted languages being compiled for speed purposes, and compiled languages being interpreted for ease-of-use purposes, desktop applications becoming subscription web applications (remember mainframe programs? ), and then web applications becoming desktop applications (electron) the more I realize that computing is closer to clothing fads than anything else. Can't wait to pickup some bellbottoms at my local target.

cabaalis | 7 years ago

From an alternate "not the web" viewpoint, I am interested in this because we have a desktop application that bootstraps a lot of JS for each view inside the application. There is a non-insignificant chunk of this time spent in parsing and the existing methods that engines expose (V8 in this case) for snapshotting / caching are not ideal. Given the initial reported gains, this could significantly ratchet down the parsing portion of perceived load time and provide a nice boost for such desktop apps. When presented at TC39, many wanted to see a bit more robust / scientific benchmarks to show that the gains were really there.

apaprocki | 7 years ago

Here's some perspective for where this project is coming from:

> So, a joint team from Mozilla and Facebook decided to get started working on a novel mechanism that we believe can dramatically improve the speed at which an application can start executing its JavaScript: the Binary AST.

I really like the organization of the present article, the author really answered all the questions I had, in an orderly manner. I'll use this format as a template for my own writing. Thanks!

Personally, I don't see the appeal for such a thing, and seems unlikely all browsers would implement it. It will be interesting to see how it works out.

le-mark | 7 years ago

This is reminiscent of the technique used by some versions of ETH Oberon to generate native code on module loading from a compressed encoding of the parse tree. Michael Franz described the technique as "Semantic-Dictionary Encoding":

«SDE is a dense representation. It encodes syntactically correct source program by a succession of indices into a semantic dictionary, which in turn contains the information necessary for generating native code. The dictionary itself is not part of the SDE representation, but is constructed dynamically during the translation of a source program to SDE form, and reconstructed before (or during) the decoding process. This method bears some resemblance to commonly used data compression schemes.»

See also "Code-Generation On-the-Fly: A Key to Portable Software" https://pdfs.semanticscholar.org/6acf/85e7e8eab7c9089ca1ff24...

This same technique also was used by JUICE, a short-lived browser plugin for running software written in Oberon in a browser. It was presented as an alternative to Java byte code that was both more compact and easier to generate reasonable native code for.

https://github.com/Spirit-of-Oberon/Juice/blob/master/JUICE....

I seem to recall that the particular implementation was quite tied to the intermediate representation of the OP2 family of Oberon compilers making backward compatibility in the face of changes to the compiler challenging and I recall a conversation with someone hacking on Oberon that indicated that he'd chosen to address (trans)portable code by the simple expedient of just compressing the source and shipping that across the wire as the Oberon compiler was very fast even when just compiling from source.

I'm guessing the hard parts are: (0) Support in enough browsers to make it worth using this format. (1) Coming up with a binary format that's actually significantly faster to parse than plain text. (SDE managed this.) (2) Designing the format to not be brittle in the face of change.

mannschott | 7 years ago

This is a really interesting project from a browser technology point of view. It makes me wonder how much code you'd need to be deploying to for this to be useful in a production environment. Admittedly I don't make particularly big applications but I've yet to see parsing the JS code as a problem, even when there's 20MB of libraries included.

onion2k | 7 years ago

This is what BASIC interpreters on 8-bit systems did from the very beginning. Some BASIC interpreters did not even allow you to type the keywords. Storing a trivially serialized binary form of the source code is a painfully obvious way to reduce RAM usage and improve execution speed. You can also trivially produce the human-readable source back.

It's of course not compilation (though parsing is the first thing a compiler would do, too). It's not generation of machine code, or VM bytecode. it's mere compression.

This is great news because you got to see the source if you want, likely nicely formatted. You can also get rid of the minifiers, and thus likely see reasonable variable names in the debugger.

nine_k | 7 years ago

This is some amazing progress, but reading this and hearing how difficult JavaScript is as a language to design around makes me wonder how many hours have we spent optimizing a language designed in 2 weeks and living with those consequences. I wish we could version our JavaScript within a tag somehow so we could slowly deprecate code. I guess that would mean though browsers would have to support two languages that would suck..... this really is unfortunately the path of least resistance.

(I understand I could use elm, cjs, emscriptem or any other transpirer but I was thinking of ours spent around improving the js vm.

ryanong | 7 years ago

This article says "Wouldn’t it be nice if we could just make the parser faster? Unfortunately, while JS parsers have improved considerably, we are long past the point of diminishing returns."

I'm gobsmacked that parsing is such a major part of the JS startup time, compared to compiling and optimizing the code. Parsing isn't slow! Or at least it shouldn't be. How many MBs of Javascript is Facebook shipping?

Does anyone have a link to some measurements? Time spent parsing versus compilation?

iainmerrick | 7 years ago

In this thread: people not understanding the difference between byte code (representing code in the form of instructions) and AST.

nikita2206 | 7 years ago

Lua has something very similar(bytecode vs AST) via luac for a long while now. We've used to to speed up parse times in the past and it helps a ton in that area.

vvanders | 7 years ago

i'm very skeptical about the benefits of a binary JavaScript AST. The claim is that a binary AST would save on JS parsing costs. however, JS parse time is not just tokenization. For many large apps, the bottleneck in parsing is instead in actually validating that the JS code is well-formed and does not contain early errors. The binary AST format proposes to skip this step [0] which is equivalent to wrapping function bodies with eval… This would be a major semantic change to the language that should be decoupled from anything related to a binary format. So IMO proposal conflates tokenization with changing early error semantics. I’m skeptical the former has any benefits and the later should be considered on its own terms.

Also, there’s immense value in text formats over binary formats in general, especially for open, extendable web standards. Text formats are more easily extendable as the language evolves because they typically have some amount of redundancy built in. The W3C outlines the value here (https://www.w3.org/People/Bos/DesignGuide/implementability.h...). JS text format in general also means engines/interpreters/browsers are simpler to implement and therefore that JS code has better longevity.

Finally, although WebAssembly is a different beast and a different language, it provides an escape hatch for large apps (e.g. Facebook) to go to extreme lengths in the name of speed. We don’t need complicate JavaScript with such a powerful mechanism already tuned to perfectly complement it.

[0]: https://github.com/syg/ecmascript-binary-ast/#-2-early-error...

s3th | 7 years ago

I am puzzled by how an binary AST makes the code significantly smaller than a minified+gziped version.

A JavaScript expression such as:

var mystuff = blah + 45

Gets minified as var a=b+45

And then what is costly in there is the "var " and character overhead which you'd hope would be much reduced by compression.

The AST would replace the keywords by binary tokens, but then would still contain function names and so on.

I mean I appreciate the effort that shipping an AST will cut an awful lot of parsing, but I don't understand why it would make such a difference in size.

Can someone comment?

d--b | 7 years ago

The linked article somehow avoids ever stating the meaning of the acronym, and I had to Google it myself, so I imagine some other people might not know: AST stands for "abstract syntax tree".

https://en.wikipedia.org/wiki/Abstract_syntax_tree

kyle-rb | 7 years ago

For those curious about how this would deal with Function.prototype.toSource, via https://github.com/syg/ecmascript-binary-ast#functionprototy...:

> This method would return something like "[sourceless code]".

mnarayan01 | 7 years ago

However this technology pans out, thank for a really well-written post. It is a model of clarity.

(And yet many people seem to have misunderstood: perhaps an example or a caricature of the binary representation might have helped make it concrete, though then there is the danger that people will start commenting about the quality of the example.)

svat | 7 years ago

These are random thought I just wrote on twitter in the morning(UTC+7):

"I kinda think that there were no front-end languages actually. It's kinda all about web platform & browsers can't do things out of the box."

"Graphic interface shouldn't execute program on its own rather than rendering string on _platform_ which won't bother more."

"This is partly why people delegate js rendering to server. At the end of the day all script should be just WebAssembly bytecodes sent down."

"Browser should act as physical rendering object like pure monitor screen. User shouldn't have to inspect photon or write photon generators."

"SPA or PWA is just that about network request reduction, and how much string wanted to send at a time & today http/2 can help that a lot."

"Project like Drab https://github.com/grych/drab 's been doing quite well to move computation back to "server" (opposite to self-service client)"

"WebAssembly compromise (to complement js) to implement the platform. JS api and WebAssembly should be merged or united."

"VirtualDom as if it is a right way should be built-in just like DOM get constructed from html _string_ from server. All JS works must die."

"That's how WebComponent went almost a half way of fulfilling web platform. It is unfortunate js has gone far, tools are actively building on"

"I'd end this now before some thought of individualism-ruining-the-platform take over. That's not gonna be something i'd like to write (now)"

-----

Not a complete version though. Kind of general speaking but I've been thinking in detail a bit. Then hours later I found this thread.

Existenceblinks | 7 years ago

It's really exciting that this would mean smaller files that parse faster, but also more readable!

TazeTSchnitzel | 7 years ago

To be honest I (as an author of the Sciter [1]) do not expect too much gain from that.

Sciter contains source code to bytecode compiler. Those bytecodes can be stored to files and loaded bypassing compilation phase. There is not too much gain as JS alike grammar is pretty simple.

In principle original ECMA-262 grammar was so simple that you can parse it without need of AST - direct parser with one symbol lookahead that produces bytecodes is quite adequate.

JavaScript use cases require fast compilation anyway. As for source files as for eval() and alike cases like onclick="..." in markup.

[1] https://sciter.com And JS parsers used to be damn fast indeed, until introduction of arrow functions. Their syntax is what requires AST.

c-smile | 7 years ago

I'd like to see some real-world performance numbers when compared with gzip. The article is a little overzealous in its claims that simply don't add up.

My suspicion is it's going to be marginal and not worth the added complexity for what essential is a compression technique.

This project is a prime example of incorrect optimization. Developers should be focused on loading the correct amount of JavaScript that's needed by their application, not on trying to optimize their fat JavaScript bundles. It's so lazy engineering.

iamleppert | 7 years ago

Yea! A whole new attack surface. A hacked AST file could cause memory corruption and other faults in the browser-side binary expander.

mnemotronic | 7 years ago

Does anyone know the actual spec for this binary AST can be found? In particular I'm curious about the format of each node type.

kevinb7 | 7 years ago

I wish for something like evalUrl() to run code that has already been parsed "in the background" so a module loader can be implemented in userland. It would be great if scripts that are prefetched or http2 pushed could be parsed in parallel and not have to be reparsed when running eval.

z3t4 | 7 years ago

Yoric - the Binary AST size comparisons in the blog - was the original javascript already minified?

malts | 7 years ago

Trying to catch up with webassembly, huh?

bigato | 7 years ago

Could the AST be made an extension of the language similar to how it works in Mathematica?

limeblack | 7 years ago

One of my main concerns with this proposal, is the increasing complexity of what was once a very accessible web platform. You have this ever increasing tooling knowledge you need to develop, and with something like this it would certainly increase as "fast JS" would require you to know what a compiler is. Sure, a good counterpoint is that it may be incremental knowledge you can pick up, but I still think a no-work make everything faster solution would be better.

I believe there exists such a no-work alternative to the first-run problem, which I attempted to explain on Twitter, but its not really the greatest platform to do so, so I'll attempt to do so again here. Basically, given a script tag:

    <script src = "abc.com/script.js" integrity="sha256-123"></script>
A browser, such as Chrome, would kick off two requests, one to abc.com/script.js, and another to cdn.chrome.com/sha256-123/abc.com/script.js. The second request is for a pre-compiled and cached version of the script (the binary ast). If it doesn't exist yet, the cdn itself will download it, compile it, and cache it. For everyone except the first person to ever load this script, the second request returns before the time it takes for the first to finish + parse. Basically, the FIRST person to ever see this script online, takes the hit for everyone, since it alerts the "compile server" of its existence, afterwards its cached forever and fast for every other visitor on the web (that uses chrome). (I have later expanded on this to have interesting security additions as well -- there's a way this can be done such that the browser does the first compile and saves an encrypted version on the chrome cdn, such that google never sees the initial script and only people with access to the initial script can decrypt it). To clarify, this solution addresses the exact same concerns as the binary AST issue. The pros to this approach in my opinion are:

1. No extra work on the side of the developer. All the benefits described in the above article are just free without any new tooling.

2. It might actually be FASTER than the above example, since cdn.chrome.com may be way faster than wherever the user is hosting their binary AST.

3. The cdn can initially use the same sort of binary AST as the "compile result", but this gives the browser flexibility to do a full compile to JIT code instead, allowing different browsers to test different levels of compiles to cache globally.

4. This would be an excellent way to generate lots of data before deciding to create another public facing technology people have to learn - real world results have proven to be hard to predict in JS performance.

5. Much less complex to do things like dynamically assembling scripts (like for dynamic loading of SPA pages) - since the user doesn't also have to put a binary ast compiler in their pipeline: you get binary-ification for free.

The main con is that it makes browser development even harder to break into, since if this is done right it would be a large competitive advantage and requires a browser vendor to now host a cdn essentially. I don't think this is that big a deal given how hard it already is to get a new browser out there, and the advantages from getting browsers to compete on compile targets makes up for it in my opinion.

tolmasky | 7 years ago
[deleted]
| 7 years ago

hehe, reminds me of emacs byte-compilation..

agumonkey | 7 years ago

Why does this guy use bits instead of bytes everywhere?

Laaas | 7 years ago

with an AST you can visualise code in ways other than text, and also reformat code like in go-fmt.

jlebrech | 7 years ago

Can you work on webpack instead?

megamindbrian | 7 years ago

> By design, however, wasm is limited to native code, so it doesn’t work with JavaScript out of the box.

By design of the first MVP iteration.

http://webassembly.org/roadmap/ and http://webassembly.org/docs/future-features/

GC (which is the main major feature required for JavaScript) is in progress, https://github.com/WebAssembly/design/issues/1079

So, your article is FUD in it's purest undistilled form.

dmitriid | 7 years ago

I feel like this may become some kind of reimplementation of Java's byte code. We already have a "write once, run anywhere" system. Good luck!

FrancoisBosun | 7 years ago