While Ben and I were talking about JavaScript performance (and other things) at Web 2.0 Expo NYC, Maciej Stachowiak announced SquirrelFish Extreme, the very new and improved version that appears to do very well at SunSpider:
SquirrelFish Extreme: 943.3 ms V8: 1280.6 ms TraceMonkey: 1464.6 ms
What makes it so fast?
SquirrelFish Extreme uses four different technologies to deliver much better performance than the original SquirrelFish: bytecode optimizations, polymorphic inline caching, a lightweight “context threaded” JIT compiler, and a new regular expression engine that uses our JIT infrastructure.
1. Bytecode Optimizations
When we first announced SquirrelFish, we mentioned that we thought that the basic design had lots of room for improvement from optimizations at the bytecode level. Thanks to hard work by Oliver Hunt, Geoff Garen, Cameron Zwarich, myself and others, we implemented lots of effective optimizations at the bytecode level.
One of the things we did was to optimize within opcodes. Many JavaScript operations are highly polymorphic - they have different behavior in lots of different cases. Just by checking for the most common and fastest cases first, you can speed up JavaScript programs quite a bit.
In addition, we’ve improved the bytecode instruction set, and built optimizations that take advantage of these improvements. We’ve added combo instructions, peephole optimizations, faster handling of constants and some specialized opcodes for common cases of general operations.
2. Polymorphic Inline Cache
One of our most exciting new optimizations in SquirrelFish Extreme is a polymorphic inline cache. This is an old technique originally developed for the Self language, which other JavaScript engines have used to good effect.
Here is the basic idea: JavaScript is an incredibly dynamic language by design. But in most programs, many objects are actually used in a way that resembles more structured object-oriented classes. For example, many JavaScript libraries are designed to use objects with “x” and “y” properties, and only those properties, to represent points. We can use this knowledge to optimize the case where many objects have the same underlying structure - as people in the dynamic language community say, “you can cheat as long as you don’t get caught”.
So how exactly do we cheat? We detect when objects actually have the same underlying structure — the same properties in the same order — and associate them with a structure identifier, or StructureID. Whenever a property access is performed, we do the usual hash lookup (using our highly optimized hashtables) the first time, and record the StructureID and the offset where the property was found. Subsequent times, we check for a match on the StructureID - usually the same piece of code will be working on objects of the same structure. If we get a hit, we can use the cached offset to perform the lookup in only a few machine instructions, which is much faster than hashing.
Here is the classic Self paper that describes the original technique. You can look at Geoff’s implementation of the StructureID class in Subversion to see more details of how we did it.
We’ve only taken the first steps on polymorphic inline caching. We have lots of ideas on how to improve the technique to get even more speed. But already, you’ll see a huge difference on performance tests where the bottleneck is object property access.
3. Context Threaded JIT
Another major change we’ve made with SFX is to introduce native code generation. Our starting point is a technique called a “context threaded interpreter”, which is a bit of a misnomer, because this is actually a simple but effective form of JIT compiler. In the original SquirrelFish announcement, we described our use of direct threading, which is about the fastest form of bytecode intepretation short of generating native code. Context threading takes the next step and introduces some native code generation.
The basic idea of context threading is to convert bytecode to native code, one opcode at a time. Complex opcodes are converted to function calls into the language runtime. Simple opcodes, or in some cases the common fast paths of otherwise complex opcodes, are inlined directly into the native code stream. This has two major advantages. First, the control flow between opcodes is directly exposed to the CPU as straight line code, so much dispatch overhead is removed. Second, many branches that were formally between opcodes are now inline, and made highly predictable to the CPU’s branch predictor.
Here is a paper describing the basic idea of context threading. Our initial prototype of context threading was created by Gavin Barraclough. Several of us helped him polish it and tune the performance over the past few weeks.
One of the great things about our lightweight JIT is that there’s only about 4,000 lines of code involved in native code generation. All the other code remains cross platform. It’s also surprisingly hackable. If you thought compiling to native code is rocket science, think again. Besides Gavin, most of us have little prior experience with native codegen, but we were able to jump right in.
Currently the code is limited to x86 32-bit, but we plan to refactor and add support for more CPU architectures. CPUs that are not yet supported by the JIT can still use the interpreter. We also think we can get a lot more speedups out of the JIT through techniques such as type specialization, better register allocation and liveness analysis. The SquirrelFish bytecode is a good representation for making many of these kinds of transforms.
4. Regular Expression JIT
As we built the basic JIT infrastructure for the main JavaScript language, we found that we could easily apply it to regular expressions as well, and get up to a 5x speedup on regular expression matching. So we went ahead and did that. Not all code spends a bunch of time in regexps, but with the speed of our new regular expression engine, WREC (the WebKit Regular Expression Compiler), you can write the kind of text processing code you’d want to do in Perl or Python or Ruby, and do it in JavaScript instead. In fact we believe that in many cases our regular expression engine will beat the highly tuned regexp processing in those other languages.
Since the SunSpider JavaScript benchmark has a fair amount of regexp content, some may feel that developing a regexp JIT is an “unfair” advantage. A year ago, regexp processing was a fairly small part of the test, but JS engines have improved in other areas a lot more than on regexps. For example, most of the individual tests on SunSpider have gotten 5-10x faster in JavaScriptCore — in some cases over 70x faster than the Safari 3.0 version of WebKit. But until recently, regexp performance hadn’t improved much at all.
We thought that making regular expressions fast was a better thing to do than changing the benchmark. A lot of real tasks on the web involve a lot of regexp processing. After all, fundamental tasks on the web, like JSON validation and parsing, depend on regular expressions. And emerging technologies — like John Resig’s processing.js library — extend that dependency ever further.
Major kudos to the entire SFX team for pulling this off. Now, to grab a new nightly...
Firefox: del.icio.us/tag/firefox
JavaScript
Firefox
safari
performance
sunspider
googlechrome
dromaeo
"Google will release a browser" that was the news of the day. But what does this mean for qooxdoo? Judging just by reading the available information (before the release) it seems to be the browser we as JavaScript developers have been waiting for.
But wait a minute. A state of the art browser is nothing one can build over night. Not even Google can do this. Remember how long it took for the Mozilla team to create a somewhat decent bowser after being open sourced? The same is true for Safari. The first Safari basically sucked. Safari 2 was usable to surf the web but to run qooxdoo it had too many bugs and too poor JavaScript performance. Now Google tells us they have a brand new web browser, which is meant to raise the bar for all other web browsers. Normally I wouldn't expect much from a completely new web browser still in its beta stage but hey this is Google and they promise a lot. So of cause I had to download and test it.
To make things short: "I'm impressed!". They really deliver what they promise. Chrome is really fast, the rendering is correct and I even like the UI. But the best is it runs qooxdoo without any modifications.
Granted they use the WebKit rendering engine, which we support, but most other parts of the browser (including the JavaScript interpeter) are new and could not be tested with qooxdoo before. Even gmx.com, a fairly large qooxdoo 0.7 application, runs without any problems.
Its nice to see qooxdoo running smooth and fast in Chrome but what really matters to me is that this browser raises the bar for the other browser manufacturer technologically.
Finally JavaScript is getting some love. After WebKit's SquirrelFish and Mozilla's TraceMonkey Chrome's interpreter V8 is the third JavaScript interpreter within a very short period of time, which promises way better JavaScript performance. Good JavaScript performance is one of the key prerequisites to write even more feature rich web applications. Technically I should not call them interpreter as at least TraceMonkey and V8 use just in time compiling (JIT) but I'm used to use this term so I'll stick with it for the moment. Cool about V8 is not only the technology but also that they have released the source on the same day as the browser. Its under the BSD license, which basically means that everyone who wants to embed this interpreter is free to do so.
The second ground breaking feature is threaded browsing. In February Thomas has written a nice article about why Firefox should support one JavaScript interpreter per tab. Now a few months later Chome is here and does exactly that. Isolating the tabs from each other and especially the JavaScript interpreter in the different tab is a real killer feature to me. This is a feature your parents and grand parents have waited for - of cause without knowing it
There should be no reason that the UI of the whole browser freezes just because some badly written JavaScript code is running. With Chrome we see that at least Google can do better. My hope and my prediction is that we'll see this in Firefox and Safari in the not so far future as well. I'm not so sure about the Internet Explorer though. Maybe in the year 2012 in IE9.
To make a browser successful, tooling is a critical aspect. Anyone who ever had to debug JavaScript in IE6 and got the infamous "Undefined is null or not an object" error with no indication where the error happened knows what I am talking about. There is a huge difference between a browser, which can run AJAX applications and a browser which can be used to write AJAX applications. Fortunately the Google guys are tech guys and they stuffed a lot of interesting developer tools into Chrome. So Chrome definitely is a browser, which can be used to write AJAX applications.
Chrome developer features:
All in all I must say I'm more than impressed by Chrome. One can see that Google has written this browser with web applications like GMail in mind and what is good for GMail is clearly good for qooxdoo as well. The 'beta' label is probably only there because people expect this from Google. I've seen worse browsers marked as a stable release. My only gripe with Chrome right now is that its not yet available for the Mac.
I encourage everyone to download and test it - if you like it install it on your mother's computer
Have fun
Update
I have to apologize to the Internet Explorer team. I said that I don't expect them to run the browser tabs in different threads. In fact IE 8 already has this feature. It was just slipped under my radar. I guess this is because they marked it mainly as a security feature. Thanks Markus for correcting me. Whether a browser isolates JavaScript execution can easily be tested by typing this URL javascript:while(1) {} in the location bar.
"Google will release a browser" that was the news of the day. But what does this mean for qooxdoo? Judging just by reading the available information (before the release) it seems to be the browser we as JavaScript developers have been waiting for.
But wait a minute. A state of the art browser is nothing one can build over night. Not even Google can do this. Remember how long it took for the Mozilla team to create a somewhat decent bowser after being open sourced? The same is true for Safari. The first Safari basically sucked. Safari 2 was usable to surf the web but to run qooxdoo it had too many bugs and too poor JavaScript performance. Now Google tells us they have a brand new web browser, which is meant to raise the bar for all other web browsers. Normally I wouldn't expect much from a completely new web browser still in its beta stage but hey this is Google and they promise a lot. So of cause I had to download and test it.
To make things short: "I'm impressed!". They really deliver what they promise. Chrome is really fast, the rendering is correct and I even like the UI. But the best is it runs qooxdoo without any modifications.
Granted they use the WebKit rendering engine, which we support, but most other parts of the browser (including the JavaScript interpeter) are new and could not be tested with qooxdoo before. Even gmx.com, a fairly large qooxdoo 0.7 application, runs without any problems.
Its nice to see qooxdoo running smooth and fast in Chrome but what really matters to me is that this browser raises the bar for the other browser manufacturer technologically.
Finally JavaScript is getting some love. After WebKit's SquirrelFish and Mozilla's TraceMonkey Chrome's interpreter V8 is the third JavaScript interpreter within a very short period of time, which promises way better JavaScript performance. Good JavaScript performance is one of the key prerequisites to write even more feature rich web applications. Technically I should not call them interpreter as at least TraceMonkey and V8 use just in time compiling (JIT) but I'm used to use this term so I'll stick with it for the moment. Cool about V8 is not only the technology but also that they have released the source on the same day as the browser. Its under the BSD license, which basically means that everyone who wants to embed this interpreter is free to do so.
The second ground breaking feature is threaded browsing. In February Thomas has written a nice article about why Firefox should support one JavaScript interpreter per tab. Now a few months later Chome is here and does exactly that. Isolating the tabs from each other and especially the JavaScript interpreter in the different tab is a real killer feature to me. This is a feature your parents and grand parents have waited for - of cause without knowing it
There should be no reason that the UI of the whole browser freezes just because some badly written JavaScript code is running. With Chrome we see that at least Google can do better. My hope and my prediction is that we'll see this in Firefox and Safari in the not so far future as well. I'm not so sure about the Internet Explorer though. Maybe in the year 2012 in IE9.
To make a browser successful, tooling is a critical aspect. Anyone who ever had to debug JavaScript in IE6 and got the infamous "Undefined is null or not an object" error with no indication where the error happened knows what I am talking about. There is a huge difference between a browser, which can run AJAX applications and a browser which can be used to write AJAX applications. Fortunately the Google guys are tech guys and they stuffed a lot of interesting developer tools into Chrome. So Chrome definitely is a browser, which can be used to write AJAX applications.
Chrome developer features:
All in all I must say I'm more than impressed by Chrome. One can see that Google has written this browser with web applications like GMail in mind and what is good for GMail is clearly good for qooxdoo as well. The 'beta' label is probably only there because people expect this from Google. I've seen worse browsers marked as a stable release. My only gripe with Chrome right now is that its not yet available for the Mac.
I encourage everyone to download and test it - if you like it install it on your mother's computer
Have fun
Update
I have to apologize to the Internet Explorer team. I said that I don't expect them to run the browser tabs in different threads. In fact IE 8 already has this feature. It was just slipped under my radar. I guess this is because they marked it mainly as a security feature. Thanks Markus for correcting me. Whether a browser isolates JavaScript execution can easily be tested by typing this URL javascript:while(1) {} in the location bar.

We posted about the new WebKit JavaScript engine SquirrelFish, and now we have an official announcement that goes into fantastic detail on the beast:
What is SquirrelFish
SquirrelFish is a register-based, direct-threaded, high-level bytecode engine, with a sliding register window calling convention. It lazily generates bytecodes from a syntax tree, using a simple one-pass compiler with built-in copy propagation.
SquirrelFish owes a lot of its design to some of the latest research in the field of efficient virtual machines, including research done by Professor M. Anton Ertl, et al, Professor David Gregg, et al, and the developers of the Lua programming language.
The post then goes into detail on why it is so much faster:
SquirrelFish’s bytecode engine elegantly eliminates almost all of the overhead of a tree-walking interpreter. First, a bytecode stream exactly describes the operations needed to execute a program. Compiling to bytecode implicitly strips away irrelevant grammatical structure. Second, a bytecode dispatch is a single direct memory read, followed by a single indirect branch. Therefore, executing a bytecode instruction is much faster than visiting a syntax tree node. Third, with the syntax tree gone, the interpreter no longer needs to propagate execution state between syntax tree nodes.
The bytecode’s register representation and calling convention work together to produce other speedups, as well. For example, jumping to the first instruction in a JavaScript function, which used to require two C++ function calls, one of them virtual, now requires just a single bytecode dispatch. At the same time, the bytecode compiler, which knows how to strip away many forms of intermediate copying, can often arrange to pass arguments to a JavaScript function without any copying.
And finishes by promising that this is just the beginning, and that we are going to see even faster:
In a typical compiler, conversion to bytecode is just a means to an end, not an end in itself. The purpose of the conversion is to “lower” an abstract tree of grammatical constructs to a concrete vector of execution primitives, the latter form being more amenable to well-known optimization techniques.
Therefore, though we’re very happy with SquirrelFish’s current performance, we also believe that it’s just the beginning. Some of the compile-time optimizations we’re looking at, now that we have a bytecode representation, include:
- constant folding
- more aggressive copy propagation
- type inference—both exact and speculative
- specialization based on expression context—especially void and boolean context
- peephole optimization
- escape analysis
This is an interesting problem space. Since many scripts on the web are executed once and then thrown away, we need to invent versions of these optimizations that are simple and efficient. Moreover, since JavaScript is such a dynamic language, we also need to invent versions of these optimizations that are resilient in the context of an unknown environment.
We’re also looking at further optimizing the virtual machine, including:
- constant pool instructions
- superinstructions
- instructions with implicit register operands
- advanced dispatch techniques, like instruction duplication and context threading
- getting computed goto working on Windows
Performance on Windows has extra room to grow because the interpreter on Windows is not direct-threaded yet. In place of computed goto, it uses a switch statement inside a loop.
Not only is SquirrelFish exciting, but the post itself shows that they get that this is aimed at developers. Great job.