Sunday, January 31, 2010

Using mod_concat to Speed Up Start Render Times

The most critical part of a page’s load time is the time before rendering starts. During this time, users may be tempted to bail, or try a different search result. For this reason, it is critical to optimize the of your HTML to maximum performance, as nothing will be visible until it finishes loading the objects inside. One easy way to speed up rendering during this crucial time is to combine your CSS and JavaScript, saving the performance tax associated with every outbound request. While easy in theory, in practice this can be difficult, especially for large organizations. For example, say your ad provider wants you to include their script in a separate file so they can make updates whenever they choose. So much for combining it into your site’s global JS to reduce the request, eh? mod_concat makes combining shared libraries easy by providing a way to dynamically concatenate many files into one.
See mod_concat in Action
We created a couple test pages to show the benefits here. In our first example without mod_concat, we see a typical large scale website with many shared CSS and JavaScript files loaded in the of the HTML. There are scripts for shared widgets (two of them video players), ad code, and more that typically plague many major web sites. You can check out the Pagetest results here, and check out the time to start render (green bar): pagetest waterfall with mod concat disabled In the test page, we have 12 JavaScript files and 2 CSS files, a total of 14 HTTP requests in the . I have seen worse. The green vertical bar is our Start Render time, or the time it took for the user to see something, at 4 seconds! We can see that the time spent downloading is typically the green time, or the time to first byte. This happens on every object, simply for existing! A way to make this not happen, is to combine those files into one, larger file. Page weight (bytes) stay the same, but Requests are reduced significantly. Let’s take a look at our Pagetest results of a second example with mod_concat enabled. pagetest waterfall of music page with modconcat enables Notice our the number of Requests went from 14 to 5, and we saved 1.5 seconds! We probably could have made an even faster example by moving to just 2 requests (one for CSS and one for JS), but the speed win here is clear.
How mod_concat Works
mod_concat is a module for Apache built by Ian Holsman, my manager at AOL and a contributor to Apache. Ian gives credit in the mod_concat documentation to David Davis, who did it while working at Vox, and perlbal. The idea is straightforward, and you can pretty much figure out how it works by viewing the source code of our second example:

href="http://lemon.holsman.net:8001/cdn/??music2.css,common.css" />




You can see in the highlighted code above that a single request is referencing multiple files, and the server is returning the concatenated version. The URL takes the following format:

http://www.yourdomain.com/optional/path/??filename1.js,directory/filename2.js,filename3.js

Let’s break it down.

http://www.yourdomain.com/

The first bit should be straight forward, it’s the host name.

http://www.yourdomain.com/optional/path/

Next comes the optional path to the files. This is important, because you can’t concatenate files above this directory if you include it. However, it allows you to optimize a bit so you don’t need to keep referencing the same path for files below this directory.

http://www.yourdomain.com/optional/path/??

The ?? then triggers the magic for the files that come next. It’s a special signal to Apache that it’s time to combine files!

http://www.yourdomain.com/optional/path/??filename1.js,

If the file is in the current directory, you can simply include it next, followed by a comma “,”.

http://www.yourdomain.com/optional/path/??filename1.js,directory/filename2.js,

If you need to go a bit further in the directory hierarchy, you can do that too.

http://www.yourdomain.com/optional/path/??filename1.js,directory/filename2.js,filename3.js

You can include as many files as you wish as long as they fall within the same server directory path defined early on in your optional/path/.
Performance and Caching Considerations
mod_concat uses the Last-Modified date of the most recently modified file when it generates the concatenated version. It should honor any max-age or expires Cache Control headers you set for the path in your server or htaccess configuration. If you have a far future expires or max-age header, to bust the cache you will need to rename one of the files or directory names in the string, and then the user will download the entire concatenated version again. Because mod_concat is an Apache module, performance is near instantaneous. Performance is improved further still if the server happens to be an origin point for a CDN, as it gets cached on the edge like an ordinary text file for as long as you tell it to, rarely hitting your servers.
Same Idea, Different Platforms
For regular folks like myself who don’t have the ability to install Apache modules with their hosting provider (cough, Lunarpages, cough), mod_concat is not the best option. The idea of concatenating JavaScript and CSS has been implemented on other platforms, and I will briefly call out those I found in my brief Googling – feel free to list more that you know of.
Rakaz’s PHP Combine Solution
Niels Leenheer of rakaz.nl has a nice solution for PHP. Niels writes:

Take for example the following URLs:

* http://www.creatype.nl/javascript/prototype.js
* http://www.creatype.nl/javascript/builder.js
* http://www.creatype.nl/javascript/effects.js
* http://www.creatype.nl/javascript/dragdrop.js
* http://www.creatype.nl/javascript/slider.js

You can combine all these files to a single file by simply changing the URL to:

* http://www.creatype.nl/javascript/prototype.js,builder.js,effects.js,dragdrop.js,slider.js

Niels takes advantage of Apache’s Rewrite rules as such to make the combine PHP script transparent to the template designer:

RewriteEngine On
RewriteBase /
RewriteRule ^css/(.*\.css) /combine.php?type=css&files=$1
RewriteRule ^javascript/(.*\.js) /combine.php?type=javascript&files=$1

This is nice because it keeps the PHP script and HTML template separate from each other, just like mod_concat.
Ed Elliot’s PHP Combine Solution
Ed’s solution for combining CSS and JavaScript is less flexible from a front-end template designer’s perspective, as you’ll need to touch PHP code to update the files being merged together. However, the advantages I see to his take on the problem are:

* He masks the actual file names being combined, and
* A new version number is automatically generated to automatically bust the cache

For folks who don’t mind digging into PHP, the above benefits may be worth the effort. I especially like the cache-busting, as it allows me to put a far future expires header without worrying if my users will get the update or not.
PHPSpeedy
Finally among the PHP scripts I found is PHPSpeedy. Also available as a plug-in for WordPress, PHPSpeedy appears to get the job done like the others, with the added benefit of automatic minification. This might be useful for folks, but I’m the obfuscator type and promote that for production build processes. I’d love to see a safe obfuscator like YUICompressor written in C so we could turn it into a module for Apache.
Lighthttpd and mod_magnet
For users of Lighthttpd, mod_magnet can be used to do the concatenation. It appears similar in nature to Rakaz’s solution, though I will leave it to you to dig in further as it seems to be fairly involved. This blog post by Christian Winther should help get you started.
ASP.Net Combiner Control
Cozi has developed an ASP.net control to combine multiple JS and CSS into a single file, and includes a cool versioning feature much like Ed Elliot’s script. It’s very easy to use; you simply wrap the script with the control tag in the template:



It then outputs the following code at runtime:



The only problem I see with their approach is that since the output file has query parameters, Safari and Opera won’t honor cache control headers as it assumes it is a dynamic file. This is why simply adding ?ver=123 to bust the cache is not a good idea for those browsers.
Java JSP Taglib – pack:tag
Daniel Galán y Martins developed a combine solution for Java called packtag. It follows in the spirit of PHPSpeedy and provides additional optimizations such as minification, GZIP, and caching. It’s not obvious from the documentation what the output of the combined script looks like, but in a flow graphic it seems to include a version number, which would be cool. The code to do the combination goes right in the JSP template, and looks like this:


/js/validation.js
/js/tracking.js
/js/edges.js


CSS can be combined too. The syntax appears to be quite flexible:


/main.css
../logout/logout.css
/css/**
http://www.example.com/css/browserfixes.css
/WEB-INF/css/hidden.css


As you can see this idea has been implemented in many languages, some with additional innovations worth considering, so if you can’t leverage mod_concat, at least use something similar as the benefits are well worth it.
Final Thoughts
mod_concat is a performant, cross-language, high-scale way to build concatenation into your build process while maintaining files separately. While it lacks automatic versioning (Ian, can we do this?), it provides a clean way to dynamically merge JS and CSS together without touching a bit of server-side code, and it works across server-side languages. One feature I’d like to see added is a debug mode. For example, if the code throws an error it may not be apparent based on line number what file is having issues. Perhaps the filename could be included in comments at the start. Remember, improving the time to start rendering the page is critical and you should focus on this first. With tools like mod_concat and the others mentioned here, there should be little excuse to implement this into your routine. Little pain, a lot to gain.

Automatic merging and versioning of CSS/JS files with PHP

Introduction

Most sites include a number of CSS and JavaScript files. Whilst developing it's usually easier to manage them as separate files but on a live site it makes sense to merge files to reduce the number of HTTP requests the browser has to make. For JavaScript this is particularly important as browsers block rendering whilst downloading. It's also important to version your files to ensure that browsers download the latest copies when you've made changes.

I hate maintaining this stuff manually so I've written a PHP script which takes care of merging files on the fly whilst also versioning the merged file automatically as the various component files change. The file is merged on first request and cached. Subsequent requests are served the cached version. The script also sets HTTP headers to ensure the user's browser maintains each version in its own local cache therefore preventing repeated requests to the server. Finally an archive of the merged files is maintained to ensure that requests for old versions return the relevant CSS/JavaScript rather than the latest which might not match the user's cached HTML.
Using the script

Step 1: Start by setting the correct mime type for the files you want to merge.

1. define('FILE_TYPE', 'text/javascript');

Step 2: Modify the $aFiles array to include the paths to the files you want to merge. These should be relative to the server document root.

1. $aFiles = array(
2. 'js/yahoo.js',
3. 'js/event.js',
4. 'js/connection.js',
5. 'js/blog-search.js'
6. );

Step 3: Set the location the script should write the archive files to. When first run it will automatically create the folder you specify if it doesn't already exist. For this to work you'll need to make sure that the parent directory, in this case "js", is owned (or is writable) by the user your web server runs as.

1. define('ARCHIVE_FOLDER', 'js/archive');

Step 4: When called directly the script returns the merged code which you reference from your HTML source. For JavaScript files your HTML source should look something like this:

1.

When included via require the script returns the latest version number rather than the source. When rendered it will look like this:

1.

I've used a .htaccess file containing the following mod_rewrite rules to map this filename to the script.

1. RewriteEngine On
2. RewriteBase /
3. RewriteRule js/site_([0-9]+).js js/combine.php?version=$1 [L]

If your host doesn't support .htaccess files you can rewrite your code to:

1.

That's it for the set up. When you make changes to your source files the script will now take care of updating both the code served and the corresponding filename in the HTML source.
Caveats

If you subsequently add files to the script which have older last-modified dates than those already included they won't trigger a new version. I could have added code to support this but it would have significantly increased the complexity of the script. To trigger a new version simply touch or re-save one of the files.
Thanks