» tagged pages
» logout

sorted by: recent | see : popular
Content Tagged with front + performance

Extreme JavaScript Performance; John Resig on Ars

Ars Technica has a new columnist, John Resig. His first piece is on Extreme JavaScript Performance which has started to come to us in abundance recently!

His article focuses on the latest updates to the fish, SquirrelFish Extreme:

A popular technique that is gaining traction amongst JavaScript engine implementers is that of optimizing the engine, while it's still processing the JavaScript code, to determine the "type" of the object that is being used. Since JavaScript doesn't include any sort of explicit type system JavaScript engines are frequently forced to check and re-check the values that they are handling, to insure their integrity. SFX rounds out the collection of other modern JavaScript engines, namely V8 and TraceMonkey, to provide this form of polymorphic inline caching. Interestingly, the idea for this form of caching comes from the Self programming language, the origin of many of the ideas in JavaScript (such as using prototypal inheritance instead of the more-common classical form of object inheritance seen in languages like Java).

JavaScript engines are serving as the test bed for new forms of dynamic language optimization. No other language is seeing this level of competition and rapid improvement that JavaScript is. This is optimal considering that JavaScript is one of the most widely-deployed programming languages available.

The SquirrelFish Extreme release currently stands as the fastest JavaScript engine [based on SunSpider] (although that's certain to change as healthy competition continues).

Ajax: Ajaxian

Regex performance in modern JSVMs

Based on its performance on the regexes it does handle, WREC (WebKit Regular Expression Compiler) is indeed an awesome design. regexp-dna.js, however, is flawed and exaggerates SFX performance.

We could use nanojit to make a regex compiler for SpiderMonkey that would perform as well as WREC. But I don’t know if it’s worthwhile yet. Regex performance is much less important for today’s web than it is for SunSpider–I hope to link to a report on that in a future post.

That was the conclusion that David Mandelin of the Tamarin project as he looked into how "SquirrelFish Extreme (SFX) is kicking our butts so badly on regexp-dna.js."

I love David's posts, as they go into the real meat of the tech:

Technical details: the design of WREC. There are two main ways to implement regular expressions: using a backtracking matching engine, or by transforming the regex to a finite automaton (NFA, aka “state machine”), which does not backtrack. Most Perl-type regex engines, including both SpiderMonkey’s and WREC, follow the backtracking design. I don’t know the exact history of that choice, but at present it is much easier to implement features like group capture and backreferences in the backtracking design. Also, although some regexes scale only if implemented as NFAs, my tests suggest that many simple regexes, including those in SunSpider, are faster with backtracking.

As of this writing, WREC’s implementation strategy is dirt simple (which is a good thing). There are no transformations or fancy optimizations on the regex. WREC simply generates native code that directly implements the backtracking search. Thus, within a single match operation, there are no function calls, no traversals of regular expression ASTs, and few option tests, so almost all of the overhead is eliminated.

WREC’s code is very easy to read, so if you want to know exactly how it works, just read it in WREC.cpp. It’s also great example code for anyone implementing a compiler for a simple language like regular expressions. The basic plan is to parse the regular expression with functions named things like parseDisjunction (the | operator). Those functions directly call functions like generateDisjunction that generate the native code using the same assembler that the call-threading interpreter uses. There’s also the oddly named “gererateParenthesesResetTrampoline”. Inexplicably preserved typo, or watermark to detect copying of WREC code?

Ajax: Ajaxian

Hammerhead: Continuous integration for performance

Steve Souders is launching Hammerhead today at The Ajax Experience.

What is Hammerhead? I kinda think of it as continuous integration for performance. It is a Firebug plugin that you can setup to monitor the performance of your application. Imagine if you add a new feature that you think will speed things up, this tool will let you know how performance was really affected.

There are also cool features when you just want to whip it up on your own Firebug:

Even if you’re not hammering a site, other features make Hammerhead a useful add-on. The Cache & Time panel, shown in Figure 3, shows the current URL’s load time. It also contains buttons to clear the disk and memory cache, or just the memory cache. It has another feature that I haven’t seen anywhere else. You can choose to have Hammerhead clear these caches after every page view. This is a nice feature for me when I’m loading the same page again and again to see it’s performance in an empty or a primed cache state. If you forget to switch this back, it gets reset automatically next time you restart Firefox.

Finally, Steve Lamm posted on the Google Code blog about testing slower connections as well as the high speed one that you are probably on, and the techniques for doing that with Hammerhead.

Steve continues to come up with small useful tools for Web developers. Thanks Steve!

Ajax: Ajaxian

SquirrelFish Extreme: JIT comes to SquirrelFish with extreme results

While Ben and I were talking about JavaScript performance (and other things) at Web 2.0 Expo NYC, Maciej Stachowiak announced SquirrelFish Extreme, the very new and improved version that appears to do very well at SunSpider:

SquirrelFish Extreme:	943.3 ms
V8: 1280.6 ms
TraceMonkey: 1464.6 ms

What makes it so fast?

SquirrelFish Extreme uses four different technologies to deliver much better performance than the original SquirrelFish: bytecode optimizations, polymorphic inline caching, a lightweight “context threaded” JIT compiler, and a new regular expression engine that uses our JIT infrastructure.

1. Bytecode Optimizations

When we first announced SquirrelFish, we mentioned that we thought that the basic design had lots of room for improvement from optimizations at the bytecode level. Thanks to hard work by Oliver Hunt, Geoff Garen, Cameron Zwarich, myself and others, we implemented lots of effective optimizations at the bytecode level.

One of the things we did was to optimize within opcodes. Many JavaScript operations are highly polymorphic - they have different behavior in lots of different cases. Just by checking for the most common and fastest cases first, you can speed up JavaScript programs quite a bit.

In addition, we’ve improved the bytecode instruction set, and built optimizations that take advantage of these improvements. We’ve added combo instructions, peephole optimizations, faster handling of constants and some specialized opcodes for common cases of general operations.

2. Polymorphic Inline Cache

One of our most exciting new optimizations in SquirrelFish Extreme is a polymorphic inline cache. This is an old technique originally developed for the Self language, which other JavaScript engines have used to good effect.

Here is the basic idea: JavaScript is an incredibly dynamic language by design. But in most programs, many objects are actually used in a way that resembles more structured object-oriented classes. For example, many JavaScript libraries are designed to use objects with “x” and “y” properties, and only those properties, to represent points. We can use this knowledge to optimize the case where many objects have the same underlying structure - as people in the dynamic language community say, “you can cheat as long as you don’t get caught”.

So how exactly do we cheat? We detect when objects actually have the same underlying structure — the same properties in the same order — and associate them with a structure identifier, or StructureID. Whenever a property access is performed, we do the usual hash lookup (using our highly optimized hashtables) the first time, and record the StructureID and the offset where the property was found. Subsequent times, we check for a match on the StructureID - usually the same piece of code will be working on objects of the same structure. If we get a hit, we can use the cached offset to perform the lookup in only a few machine instructions, which is much faster than hashing.

Here is the classic Self paper that describes the original technique. You can look at Geoff’s implementation of the StructureID class in Subversion to see more details of how we did it.

We’ve only taken the first steps on polymorphic inline caching. We have lots of ideas on how to improve the technique to get even more speed. But already, you’ll see a huge difference on performance tests where the bottleneck is object property access.

3. Context Threaded JIT

Another major change we’ve made with SFX is to introduce native code generation. Our starting point is a technique called a “context threaded interpreter”, which is a bit of a misnomer, because this is actually a simple but effective form of JIT compiler. In the original SquirrelFish announcement, we described our use of direct threading, which is about the fastest form of bytecode intepretation short of generating native code. Context threading takes the next step and introduces some native code generation.

The basic idea of context threading is to convert bytecode to native code, one opcode at a time. Complex opcodes are converted to function calls into the language runtime. Simple opcodes, or in some cases the common fast paths of otherwise complex opcodes, are inlined directly into the native code stream. This has two major advantages. First, the control flow between opcodes is directly exposed to the CPU as straight line code, so much dispatch overhead is removed. Second, many branches that were formally between opcodes are now inline, and made highly predictable to the CPU’s branch predictor.

Here is a paper describing the basic idea of context threading. Our initial prototype of context threading was created by Gavin Barraclough. Several of us helped him polish it and tune the performance over the past few weeks.

One of the great things about our lightweight JIT is that there’s only about 4,000 lines of code involved in native code generation. All the other code remains cross platform. It’s also surprisingly hackable. If you thought compiling to native code is rocket science, think again. Besides Gavin, most of us have little prior experience with native codegen, but we were able to jump right in.

Currently the code is limited to x86 32-bit, but we plan to refactor and add support for more CPU architectures. CPUs that are not yet supported by the JIT can still use the interpreter. We also think we can get a lot more speedups out of the JIT through techniques such as type specialization, better register allocation and liveness analysis. The SquirrelFish bytecode is a good representation for making many of these kinds of transforms.

4. Regular Expression JIT

As we built the basic JIT infrastructure for the main JavaScript language, we found that we could easily apply it to regular expressions as well, and get up to a 5x speedup on regular expression matching. So we went ahead and did that. Not all code spends a bunch of time in regexps, but with the speed of our new regular expression engine, WREC (the WebKit Regular Expression Compiler), you can write the kind of text processing code you’d want to do in Perl or Python or Ruby, and do it in JavaScript instead. In fact we believe that in many cases our regular expression engine will beat the highly tuned regexp processing in those other languages.

Since the SunSpider JavaScript benchmark has a fair amount of regexp content, some may feel that developing a regexp JIT is an “unfair” advantage. A year ago, regexp processing was a fairly small part of the test, but JS engines have improved in other areas a lot more than on regexps. For example, most of the individual tests on SunSpider have gotten 5-10x faster in JavaScriptCore — in some cases over 70x faster than the Safari 3.0 version of WebKit. But until recently, regexp performance hadn’t improved much at all.

We thought that making regular expressions fast was a better thing to do than changing the benchmark. A lot of real tasks on the web involve a lot of regexp processing. After all, fundamental tasks on the web, like JSON validation and parsing, depend on regular expressions. And emerging technologies — like John Resig’s processing.js library — extend that dependency ever further.

Major kudos to the entire SFX team for pulling this off. Now, to grab a new nightly...

Ajax: Ajaxian

Delayed Script Execution; An Opera feature that has Steve excited

Steve has found a new tidbit that has him excited. The feature at hand comes from Opera

Primarily for low bandwidth devices, not well-tested on desktop. Ignore script tags until entire document is parsed and rendered, then execute all scripts in order and re-render.

Steve explains how you he is a fan of splitting up JavaScript into a small core, and then loading other functionality asynchronously later. This defer gives you some of that benefit, and also groks document.write, which no other technique works with:

One limitation of these techniques is that you can’t use document.write, because when a script is loaded asynchronously the browser has already written the document. Hardcore JavaScript programmers avoid document.write, but it’s still used in the real world most notably, and infamously, by ads. A feature of Opera’s “Delayed Script Execution” option is that, even though scripts are deferred, document.write still works correctly. Opera remembers the script’s location in the page and inserts the document.write output appropriately.

So, this is why Steve is interested to dive deeper and see if this has the performance benefits that make sense theoretically:

One immediate benefit of this Opera preference is that web developers can see the impact of delay-loading their JavaScript. A practice I’m advocating a lot lately is splitting a large JavaScript payload into two pieces, one of which can be loaded using an asynchronous script loading technique. This is often a complex task as the JavaScript payload grows in size and complexity. With this “Delayed Script Execution” feature in Opera, developers can get an idea of how their page would feel before undertaking the heavy lifting.

I’m even more excited about how this shows us what is possible for the future. To be able to have asynchronous script loading and preserve document.write output is like having your cake and eating it too. It’s difficult for users to find this feature in Opera. And it’s beyond the reach of web developers. But if Opera’s “Delayed Script Execution” behavior was the basis for implementing SCRIPT DEFER in all browsers, it would open the door for significant performance improvements by simply adding six characters (”DEFER ”).

This is most significant for the serving of ads. Often ads are served by including a script that contains document.write to load other resources: images, flash, or even another script. Ads are typically placed high in the page, which means today’s pages suffer from slow loading ads because all their content gets blocked. And really, it’s not the pages that suffer, it’s the users. Our experience suffers. Everyone’s experience suffers. If browsers supported an implementation of SCRIPT DEFER that behaved similar to Opera’s “Delayed Script Execution” feature, we’d all be better off.

Food for thought for Safari, Firefox, and IE.

Ajax: Ajaxian

New Profilers and Debuggers in Google Chrome and IE

Sameer Chabungbam of Microsoft posted about the new JScript profiler the includes the following functionality:

  • Provides performance data for JScript functions in two views:
    • Functions View – a flat listing of all the functions
    • Call Tree view – a hierarchical listing of the functions based on the call flow
  • Supports exporting the data to a file
  • Provides an inferred name for anonymous functions
  • Profiles built-in JScript functions
  • Supports multiple profile reports
  • Supports profiling across page navigation and refreshes

Eric Pascarello has also been looking at new tools, and wrote up his experience with the Google Chrome Debugger. He details the breakpoint walking functionality as well as the many commands available.

Ajax: Ajaxian

Brendan discusses how TraceMonkey is climbing faster; Ruby on the Web with V8

Brendan Eich jumped right in and benchmarked the tip of tree for TraceMonkey, with the V8 version that came with Google Chrome:

We win on the bit-banging, string, and regular expression benchmarks. We are around 4x faster at the SunSpider micro-benchmarks than V8.

This graph does show V8 cleaning our clock on a couple of recursion-heavy tests. We have a plan, to trace recursion (not just tail recursion). We simply haven't had enough hours in the day to get to it, but it's "next".

Brendan shows SunSpider running there, and V8 has that and other benchmarks to run too. Isn't it great when a performance arms war is on? Thank god for competition here. We all win.

Ray Cromwell ran tests himself, on his own app Chronoscope (note, probably NOT using tip of tree TraceMonkey):

Chronoscope is written in GWT, and to some extent, the GWT compiler may negate some of Chrome's V8 technology in the sense that GWT "de-classes" many OO polymorphic dispatches into a more functional style of programming, removing as much dynamic dispatch as possible, and eliminating prototype lookups and function call overhead through inlining. I don't know if GWT hurts "hidden classes" or not, but it might be possible that if GWT didn't provide such optimizations, the performance differential might be larger.

Despite this, the results are still good. The test consisted of calling the chart's redraw() function 100 times per trial, with 10 trials. The slowest and fastest trial are thrown out, and the mean and standard deviation are calculated on the remaining data.

I tested on a Mac Pro 2.66Ghz with 6Gb of memory, OSX 1.5. The tests were conducted within a Parallels VM running XP2 Service Pack 2, given 2 CPUs and 2Gb of memory. For each browser, I rebooted the VM from a clean start, and ran only the test browser.

And for a bit of fun, Marc-Andre Cournoyer tied together HotRuby (remember that? the beast that runs YARV code in the browser!) and V8 to create fast Ruby in the browser.

Good times.

Ajax: Ajaxian

Razor Profiler: Check out your Ajax code

Razor Profiler is a web-based Ajax profiling tool to help web developers understand and analyze the runtime behavior of their JavaScript code in a cross-browser environment. Razor Profiler can be access either online as a service; or be downloaded to run locally, and was created by Coach Wei who has done a lot of work for Nexaweb and Apache.

Razor Profiler Features

Razor Profiler automates JavaScript profiling:

  • Automation: no application code change required. Razor Profiler automatically collects all the necessary data and presents them to web developers for analysis.
  • Runs on any browser: web developers can profile any JavaScript application on any browser. There is nothing to install on the client side.
  • Rich lexical analysis: Razor Profiler presents rich lexcial information about the application, such as file information (number, response status, size, mimetype, percentage, etc), tokens (size, file, percent, count), and functions (size, file, name...), etc;
  • Profile scenario recording: Razor Profile enables web developers to selectively record the scenarios that they are interested in. Only recorded scenarios will be used in analysis.
  • Call stack analysis: for each recorded scenario, Razor Profiler presents all the call stacks in the order of their occurence. For each call stacks, web developers can drill into it to find out the duration of the stack, all the function calls of this stack and the duration of each call.
  • Function analysis: For each JavaScript function in the application, Razor Profile presents the number of times it has been invoked, the duration of each invocation, and the call stacks that invoked this function.
  • Data visualization with graphing and charting: Razor Profiler presents top call stacks, top function calls of each stack, top recorded scenarios, etc. using visual charts and graphs to help web developers better understand the runtime behavior of their application. For example, each call stack is visualized as an intuitive Gantt chart.

How Does Razor Profiler Work?

Razor Profiler composes of a server component that runs inside a standard Java EE Servlet engine, and a JavaScript-based client component that runs inside any browser. Once you have Razor server started, you can profile your JavaScript application by entering the start URL of your application into Razor Profiler and run through your test scenarios. Razor Profiler will automatically record data and visualize them for your analysis. There is no client side installation, browser configuration change or application code change required. In order to achieve this, Razor Profiler goes through five different phases:

  • Application retrieval: Once a web developer enters the application start URL into Razor Profiler, Razor Profiler client component ("the client") will send this URL to Razor Profiler server component ("the server"). The server performs the actually retrieval of this URL. After additional server processing (such as lexical analysis and code injection, see below), the retrieved content is sent to the client side to be displayed in a new browser window. For the developer point of view, the application is launched and running in this new browser window.
    In this process, Razor Profiler Server is acting like a "proxy server". But it is not really a "proxy server" and there is no need for developers to re-configure their browser proxy settings.
  • Lexical analysis: Once the server retrieves the application URL, it performs lexical analysis of the returned content by identifying and analyzing JavaScript files, functions, and tokens,etc. The result is sent to the client for display.
  • Code injection: Upon lexical analysis of JavaScript code, the server injects "probe" code into the application's JavaScript sources before returning them to the client. These injected "probes" enable automatic collection of application runtime data, and saves developers from doing so manually.
  • Runtime data capture: Once the application's JavaScript code is running on the client side and as developers run through desired profile scenarios, the injected "probes" automcally collect all the necessary data to Razor Profiler Client.
  • Data analysis: When the developer finishes recording scenarios and starts data analysis, Razor Profiler client performs analysis of all the collected data and presents the results.

Ajax: Ajaxian

Proxy issues with querystrings in path names

You have seen this before: /path/to/something.js?v=2, or maybe it used a date or a version control id or some such. The notion of putting the version into the URL so you can aggressively cache and yet quickly push new versions.

There has long been issues with using the querystring as the version. At some point I seem to remember Safari not going a good job caching that scenario and thinking that it was different.

Steve "Neo" Souders has posted about this issue especially as it relates to proxy servers and default configurations:

There’s a section in my book called Revving Filenames. It contains an example of adding a version number to the filename. That’s prompted several emails where people have asked me about tradeoffs around using a querystring versus embedding something in the filename. I wasn’t aware of any performance difference, but in a meeting this week a co-worker, Jacob Hoffman-Andrews, mentioned that Squid, a popular proxy, doesn’t cache resources with a querystring. This hurts performance when multiple users behind a proxy cache request the same file - rather than using the cached version everybody would have to send a request to the origin server.

I tested this by creating two resources, mylogo.1.2.gif and mylogo.gif?v=1.2. Both have a far future Expires date. I configured my browser to go through a Squid proxy. I made one request to mylogo.1.2.gif, cleared my cache (to simulate another user making the request), and fetched mylogo.1.2.gif again. This produces the following HTTP headers:

>> GET http://stevesouders.com/mylogo.1.2.gif HTTP/1.1
<< HTTP/1.0 200 OK

<< Date: Sat, 23 Aug 2008 00:17:22 GMT
<< Expires: Tue, 21 Aug 2018 00:17:22 GMT
<< X-Cache: MISS from someserver.com
<< X-Cache-Lookup: MISS from someserver.com

>> GET http://stevesouders.com/mylogo.1.2.gif HTTP/1.1
<< HTTP/1.0 200 OK
<< Date: Sat, 23 Aug 2008 00:17:22 GMT
<< Expires: Tue, 21 Aug 2018 00:17:22 GMT
<< X-Cache: HIT from someserver.com

<< X-Cache-Lookup: HIT from someserver.com

Notice that the second response shows a HIT in the X-Cache and X-Cache-Lookup headers. This shows it was served by the Squid proxy. More evidence of this is the fact that the Date and Expires response headers have the same values, even though I made these requests 10 seconds apart. For conclusive evidence, only one hit shows up in the stevesouders.com access log.

Loading mylogo.gif?v=1.2 twice (clearing the cache in between) results in these headers:

>> GET http://stevesouders.com/mylogo.gif?v=1.2 HTTP/1.1
<< HTTP/1.0 200 OK
<< Date: Sat, 23 Aug 2008 00:19:34 GMT
<< Expires: Tue, 21 Aug 2018 00:19:34 GMT

<< X-Cache: MISS from someserver.com
<< X-Cache-Lookup: MISS from someserver.com

>> GET http://stevesouders.com/mylogo.gif?v=1.2 HTTP/1.1
<< HTTP/1.0 200 OK
<< Date: Sat, 23 Aug 2008 00:19:47 GMT
<< Expires: Tue, 21 Aug 2018 00:19:47 GMT
<< X-Cache: MISS from someserver.com
<< X-Cache-Lookup: MISS from someserver.com

Here it’s clear the second response was not served by the proxy: the caching response headers say MISS, the Date and Expires values change, and tailing the stevesouders.com access log shows two hits.

Proxy administrators can change the configuration to support caching resources with a querystring, when the caching headers indicate that is appropriate. But the default configuration is what web developers should expect to encounter most frequently. Another interesting note about these tests: notice how the proxy downgrades the responses to HTTP/1.0. This is going to alter browser behavior in terms of the number of connections that are opened. When I’m doing performance analysis I make sure to avoid being connected through a proxy.

Ajax: Ajaxian

Squirreling out the Fish on the iPhone

HTML:
  1.  
  2. <script type="text/javascript">
  3. function recurse(n) {
  4.     if (n> 0) {
  5.         return recurse(n - 1);
  6.     }
  7.     return 0;
  8. }
  9.  
  10. try {
  11.     // recurse(43687);  // Highest that works for me in WebKit
  12.                         // nightly builds as of 24 Jul 2008.
  13.     // recurse(2999);   // Highest that works for me in Firefox 3.0.1
  14.     // recurse(499);    // Highest that works for me in Safari 3.1.2
  15.     recurse(3000);
  16.     document.write("Could be SquirrelFish.");
  17. } catch(e) {
  18.     document.write("Not SquirrelFish.");
  19. }
  20. </script>
  21.  

This is the hack that John Grubber used to test whether iPhone 2.x had snuck in SquirrelFish. He was curious due to the performance improvements that he witnessed:

What about iPhone limits though? David Golightly tests the limits on the iPhone with a script that keeps downloading tiles until it can no longer do so:

After downloading about 210 images, the iPhone simply stops downloading new ones. This is probably due to hitting the hard 30MB same-page resource limit.

Ajax: Ajaxian

Increase DOM Node Insertion Performance

John Resig continues his streak of compelling blog entries with "DOM DocumentFragments" where he shows that:

A method that is largely ignored in modern web development can provide some serious (2-3x) performance improvements to your DOM manipulation.

The technique shown is compatible across a large swath of modern browsers, including our friend IE6. Here's an example of using DocumentFragments:

JAVASCRIPT:
  1.  
  2. var div = document.getElementsByTagName("div");
  3. var fragment = document.createDocumentFragment();
  4. for ( var e = 0; e <elems.length; e++ ) {
  5.         fragment.appendChild( elems[e] );
  6. }
  7.  
  8. for ( var i = 0; i <div.length; i++ ) {
  9.         div[i].appendChild( fragment.cloneNode(true) );
  10. }
  11.  

Ajax: Ajaxian

Webslug: The hot or not of website performance tools

Kimble Young has created Webslug, the "hot or not of website performance."

It was inspired by webwait but lets you compare the load times of sites and records every performance test for later analysis like browser used, country of origin, top competitors etc.

For example, comparing reddit to Digg:

Webslug

It is useful to compare versions of your own site, too.

Ajax: Ajaxian

qUIpt: caching JS in window.name

Mario Heiderich has released qUIpt, a library that uses the window.name property to store away useful data, in this case JavaScript.

How does it work?

  • It checks for the contents of window.name while your page is being loaded.
  • If there's nothing inside the window.name cache the JS files defined by you are fetched via XHR
  • The same happens if the users enters your site for the first time of his current browser session or if document.referrer is off-domain or empty
  • After that the contents of window.name are being evaluated
  • If the user requests the next page on your domain the JS files are directly taken from window.name - no more requests necessary

You can check out an example of it at work

Ajax: Ajaxian

qUIpt: caching JS in window.name

Mario Heiderich has released qUIpt, a library that uses the window.name property to store away useful data, in this case JavaScript.

How does it work?

  • It checks for the contents of window.name while your page is being loaded.
  • If there's nothing inside the window.name cache the JS files defined by you are fetched via XHR
  • The same happens if the users enters your site for the first time of his current browser session or if document.referrer is off-domain or empty
  • After that the contents of window.name are being evaluated
  • If the user requests the next page on your domain the JS files are directly taken from window.name - no more requests necessary

You can check out an example of it at work

Ajax: Ajaxian

Velocity Conference Videos and Slides

Steve Souders has a wrap up on the Velocity conference that he co-chaired. He links to his favourite content from the show, which contains a lot of Ajax related work. It was really good to hear snippets form the show such as Eric Lawrence of Microsoft saying "we hope to make Steve's book out of date" as we see browser vendors look more and more on performance.

Ajax: Ajaxian

Rendering performance in Canvas compared to SVG and VML

Just after I posted about Ernest's canvas experiment with photos he put something else up that tests the performance of rendering polygons with Canvas compared to other techniques.

The demo lets you run a live test, and view saved tests, comparing the Google Maps interface, which "currently draws polygons using VML for Internet Explorer, SVG for Firefox and image retrieval for Safari and Firefox linux."

Canvas Rendering Compare

At first the results were surprising. The canvas version was magnitudes faster. However, then they worked out that the live Google Maps version is actually doing a lot more than just drawing the polygons, that being said, a commenter had a valid point:

If we analyze the rendering time of the markup alone, both SVG and VML are not necessarily slower than canvas and canvas+excanvas.js. So the difference in performance is due to the implementation of polygons before the markup is output which the canvas implementation is skipping.

That doesn't make the experiment invalid. You didn't show that Canvas is faster than SVG or VML.
But you did show that it's possible to get much better polygon performance than the current API using a more direct to the metal approach - with whatever rendering engine. And people are crying out for faster polygons.

Ajax: Ajaxian

YSlow now has Firefox 3 support

Keeping in the performance vein, YSlow put a new version out in time for Firefox 3.

The new version includes:

  • Firefox 3 and Firebug 1.2 beta support
  • improved and simplified check for javascript minification
  • different coloring for inline vs. external CSS and JS ("All CSS" and "All JS" features)
  • clickable list of resources as a Table of Contents ("All CSS" and "All JS" features)
  • improved colors and presentation in the "legend" of component pies under Stats
  • fixed a bug where the same hostname with different port number was counted as a separate DNS lookup
  • misc bugfixes and style tweaks

Ajax: Ajaxian

Clientperf: Simple Client-Side Rails Performance

Eric Falcao has released Clientperf, a simple client-side Rails performance plugin.

The tool came about as Eric is giving a talk on "14 rules of high-performance websites in the typical rails mongrel/nginx stack, the main idea being to focus on some of the important implementation details when it comes to client-side performance optimization."

As I was planning, I realized that there was no simple as in the we’re-all-spoiled-with-rails simple way to measure client download times in production. Now, there is clientperf. It’s just a start, but decent enough to benchmark the actual client performance impact of any optimizations you make.

How it works

It injects javascript into the page that takes a timestamp at the top of the page and at the bottom of the page. Once the browser is done downloading, evaluating and rendering all assets, clientperf makes one last image request to your server with the start time, end time and the URL. Piece of cake.

Rails Client Performance

Ajax: Ajaxian

Jiffy Firebug Plugin: Fine grained calculation of performance timings

Today is the kick off of the Velocity performance conference, and we are going to see a fair share of performance news over the next day or two.

To start out, Bill Scott (Rico/ex-Yahoo/now Netflix) has announced a new Firebug plugin, Jiffy that adds a new tab showing fine grained performance data. You want to know the time between the onunload of the previous page, the first rendering, time until onload, time after, and more.

This is where Jiffy-Web comes in. Jiffy-Web is a fine-grained and flexible website performance tracking and analysis suite written by Scott Ruthfield and the team at Whitepages.com.

The Firebug plugin uses that data, which it gets from the DOM JSON object, to do the visualization.

Jiffy Firebug Plugin

Bill wrote a detailed post on Measuring User Experience Performance that goes into the details behind this tool.

He goes into detail on how to measure things, and what can get in the way. For example, onunload:

The most logical place to measure the start of a request ("from Click") is on the originating page (see A in figure above). The straighforward approach is to add a timing capture to the unload event (or onbeforeunload). More than one technique exist for persisting this measurement, but the most common way is to write the timing information (like URL, user agent, start time, etc.) to a cookie.

However, there is a downside to this methodology. If the user navigates to your home page from elsewhere (e.g., from a google search), then there will be no "start time" captured since the unload event never happened on your site. So we need a more consistent "start time".

We address this by providing an alternate start time. We instrument a time capture at the very earliest point in the servlet that handles the request at the beginning of the response (see B in figure above). This guarantees that we will always have a start time. While it does miss the time it takes to handle the request, it ends up capturing the important part of the round trip time -- from response generation outward.

There are a number of ways to save this information so that it can be passed along through the response cycle to finally be logged. You can write out a server-side cookie. You can generate JSON objects that get embedded in the page. You could even pass along parameters in the URL (though this would not be desirable for a number of reasons). The point is you will need a way to persist the data until it gets out to the generated page for logging.

Note that the absolute time captured here is in server clock time and not client clock time. There is no guarantee these values will be in sync. We will discuss how we handle this later.

He also talks about practical issues that he has found implementing this at Netflix, and when the data shows you the real truth:

Recently we fielded a different variation of our star ratings widget. While it cut the number of HTTP requests in half for large Queue pages (a good thing) it actually degraded performance. Having real time performance data let us narrow down on the culprit. This feedback loops is an excellent learning tool for performance. With our significant customer base, large number of daily page hits we can get a really reliable read on the performance our users are experiencing. As a side note, the median is the best way to summarize our measurements as it nicely takes care of the outliers (think of the widely varying bandwidths, different browser performance profiles that can all affect measurements.)

Ajax: Ajaxian

ensure: on demand resources

Omar AL Zabir of Pageflakes.com has posted on ensure, his JavaScript library that provides a handy function ensure which allows you to load JavaScript, HTML, CSS on-demand and then execute your code.

Ensure ensures that relevant JavaScript and HTML snippets are already in the browser DOM before executing your code that uses them.

For example:

JAVASCRIPT:
  1.  
  2. ensure( { js: "Some.js" }, function() {
  3.     SomeJS(); // The function SomeJS is available in Some.js only
  4. });
  5.  

You can also specify multiple Javascripts, html or CSS files to ensure all of them are made available before executing the code:

JAVASCRIPT:
  1.  
  2. ensure( { js: ["blockUI.js","popup.js"], html: ["popup.html", "blockUI.html"], css: ["blockUI.css", "popup.css"] }, function() {
  3.     BlockUI.show();
  4.     PopupManager.show();
  5. });
  6.  

Omar says:

Websites with rich client side effects (animations, validations, menus, popups) and Ajax websites require large amount of Javascript, HTML and CSS to be delivered to the browser on the same web page. Thus the initial loading time of a rich web page increases significantly as it takes quite some time to download the necessary components. Moreover, delivering all possible components upfront makes the page heavy and browser gets sluggish responding to actions. You sometimes see pull-down menus getting stuck, popups appearing slowly, window scroll feels sluggish and so on.

The solution is not to deliver all possible HTML, Javascript and CSS on initial load instead deliver them when needed. For example, when user hovers the mouse on menu bar, download necessary Javascript and CSS for the pull-down menu effect as well as the menu html that appears inside the pull-down. Similarly, if you have client side validations, deliver client side validation library, relevant warning HTML snippets and CSS when user clicks the 'submit' button. If you have a Ajax site which shows pages on demand, you can load the Ajax library itself only when user does the action that results in an Ajax call. Thus by breaking a complex page full of HTML, CSS and Javascript into smaller parts, you can significantly lower down the size of the initial delivery and thus load the initial page really fast and give user a fast smooth browsing experience.

There is a detailed writeup on how it all works, and it dovetails with the recent performance proposals around when to download resources (sometimes you may not want to wait for on demand loading of course).

Ajax: Ajaxian

String Performance in IE: Array.join vs += continued

Tom Trenka has followed up his last post on String performance with a deep dive on IE that dispells the myth of Array.join.

Tom goes through tons of tests across versions of IE and using varying sizes of data.

In Conclusion

First things first—with the performance improvements with IE7, we no longer need to consider using an alternate path when doing large scale string operations; using Array.join in an iterative situation gives you no major advantages than using += in the same situation. In addition, the differences with IE6 were slight enough to allow you to not bother forking for that specific version.

The only time considering using an array as opposed to a string for these kind of operations is when you are aware that the fragments you are appending are very large (on the order of > 65536 bytes); doing this will cause the GC issues Dan Pupius talks about in his analysis of object allocation and the JScript garbage collector.

From there, we can progress to programming techniques—with Internet Explorer, it is much better to call Builder.append with as many arguments as possible than to simply iterate and push things in one at a time.

It is also better to start small; try to structure your string operations so that very large string operations are minimized. In this case, using a temporary buffer to assemble a set of strings together and then adding them to a much larger string is better than constantly adding small fragments to a larger string.

And as always, minimizing the size of an iteration will help get extra performance out of JScript.

The raw numbers have been made available to scour over.

Ajax: Ajaxian

A Technique For Lazy Script Loading

Bob Matsuoka has written a guest article on the topic of lazy script loading. Thanks so much Bob!

A recent article "Lazily load functionality via Unobtrusive Scripts" discussed how to lazily load Javascript script files by appending script elements to the HEAD tag.

While this works as expected, I've found that for best results, you should also consider tracking which scripts have been loaded in order to prevent re-loading an already loaded script, and more importantly supporting callbacks so that you can guarantee loading of scripts prior to calling functions that depend on that code.

NOTE: The example loader script, which has been tested in FF, IE, Safari, and Opera, uses prototype.js for DOM and array routines. I developed this originally for a project that already had prototype.js available, but it uses it only superficially. It should be simple to remove these references if you're not using prototype.js.

JAVASCRIPT:
  1.  
  2. /** 
  3. *  Script lazy loader 0.5
  4. *  Copyright (c) 2008 Bob Matsuoka
  5. *
  6. *  This program is free software; you can redistribute it and/or
  7. *  modify it under the terms of the GNU General Public License
  8. *  as published by the Free Software Foundation; either version 2
  9. *  of the License, or (at your option) any later version.
  10. */
  11.  
  12. var LazyLoader = {}; //namespace
  13. LazyLoader.timer = {}// contains timers for scripts
  14. LazyLoader.scripts = []// contains called script references
  15. LazyLoader.load = function(url, callback) {
  16.         // handle object or path
  17.         var classname = null;
  18.         var properties = null;
  19.         try {
  20.                 // make sure we only load once
  21.                 if ($A(LazyLoader.scripts).indexOf(url) == -1) {
  22.                         // note that we loaded already
  23.                         LazyLoader.scripts.push(url);
  24.                         var script = document.createElement("script");
  25.                         script.src = url;
  26.                         script.type = "text/javascript";
  27.                         $$("head")[0].appendChild(script)// add script tag to head element
  28.                        
  29.                         // was a callback requested
  30.                         if (callback) {    
  31.                                 // test for onreadystatechange to trigger callback
  32.                                 script.onreadystatechange = function () {
  33.                                         if (script.readyState == 'loaded' || script.readyState == 'complete') {
  34.                                                 callback();
  35.                                         }
  36.                                 }                            
  37.                                 /