Maneesh Thareja: January 2010

Sunday, January 31, 2010

Using mod_concat to Speed Up Start Render Times

The most critical part of a page’s load time is the time before rendering starts. During this time, users may be tempted to bail, or try a different search result. For this reason, it is critical to optimize the of your HTML to maximum performance, as nothing will be visible until it finishes loading the objects inside. One easy way to speed up rendering during this crucial time is to combine your CSS and JavaScript, saving the performance tax associated with every outbound request. While easy in theory, in practice this can be difficult, especially for large organizations. For example, say your ad provider wants you to include their script in a separate file so they can make updates whenever they choose. So much for combining it into your site’s global JS to reduce the request, eh? mod_concat makes combining shared libraries easy by providing a way to dynamically concatenate many files into one.
See mod_concat in Action
We created a couple test pages to show the benefits here. In our first example without mod_concat, we see a typical large scale website with many shared CSS and JavaScript files loaded in the of the HTML. There are scripts for shared widgets (two of them video players), ad code, and more that typically plague many major web sites. You can check out the Pagetest results here, and check out the time to start render (green bar): pagetest waterfall with mod concat disabled In the test page, we have 12 JavaScript files and 2 CSS files, a total of 14 HTTP requests in the . I have seen worse. The green vertical bar is our Start Render time, or the time it took for the user to see something, at 4 seconds! We can see that the time spent downloading is typically the green time, or the time to first byte. This happens on every object, simply for existing! A way to make this not happen, is to combine those files into one, larger file. Page weight (bytes) stay the same, but Requests are reduced significantly. Let’s take a look at our Pagetest results of a second example with mod_concat enabled. pagetest waterfall of music page with modconcat enables Notice our the number of Requests went from 14 to 5, and we saved 1.5 seconds! We probably could have made an even faster example by moving to just 2 requests (one for CSS and one for JS), but the speed win here is clear.
How mod_concat Works
mod_concat is a module for Apache built by Ian Holsman, my manager at AOL and a contributor to Apache. Ian gives credit in the mod_concat documentation to David Davis, who did it while working at Vox, and perlbal. The idea is straightforward, and you can pretty much figure out how it works by viewing the source code of our second example:

href="http://lemon.holsman.net:8001/cdn/??music2.css,common.css" />

You can see in the highlighted code above that a single request is referencing multiple files, and the server is returning the concatenated version. The URL takes the following format:

http://www.yourdomain.com/optional/path/??filename1.js,directory/filename2.js,filename3.js

Let’s break it down.

http://www.yourdomain.com/

The first bit should be straight forward, it’s the host name.

http://www.yourdomain.com/optional/path/

Next comes the optional path to the files. This is important, because you can’t concatenate files above this directory if you include it. However, it allows you to optimize a bit so you don’t need to keep referencing the same path for files below this directory.

http://www.yourdomain.com/optional/path/??

The ?? then triggers the magic for the files that come next. It’s a special signal to Apache that it’s time to combine files!

http://www.yourdomain.com/optional/path/??filename1.js,

If the file is in the current directory, you can simply include it next, followed by a comma “,”.

http://www.yourdomain.com/optional/path/??filename1.js,directory/filename2.js,

If you need to go a bit further in the directory hierarchy, you can do that too.

http://www.yourdomain.com/optional/path/??filename1.js,directory/filename2.js,filename3.js

You can include as many files as you wish as long as they fall within the same server directory path defined early on in your optional/path/.
Performance and Caching Considerations
mod_concat uses the Last-Modified date of the most recently modified file when it generates the concatenated version. It should honor any max-age or expires Cache Control headers you set for the path in your server or htaccess configuration. If you have a far future expires or max-age header, to bust the cache you will need to rename one of the files or directory names in the string, and then the user will download the entire concatenated version again. Because mod_concat is an Apache module, performance is near instantaneous. Performance is improved further still if the server happens to be an origin point for a CDN, as it gets cached on the edge like an ordinary text file for as long as you tell it to, rarely hitting your servers.
Same Idea, Different Platforms
For regular folks like myself who don’t have the ability to install Apache modules with their hosting provider (cough, Lunarpages, cough), mod_concat is not the best option. The idea of concatenating JavaScript and CSS has been implemented on other platforms, and I will briefly call out those I found in my brief Googling – feel free to list more that you know of.
Rakaz’s PHP Combine Solution
Niels Leenheer of rakaz.nl has a nice solution for PHP. Niels writes:

Take for example the following URLs:

* http://www.creatype.nl/javascript/prototype.js
* http://www.creatype.nl/javascript/builder.js
* http://www.creatype.nl/javascript/effects.js
* http://www.creatype.nl/javascript/dragdrop.js
* http://www.creatype.nl/javascript/slider.js

You can combine all these files to a single file by simply changing the URL to:

* http://www.creatype.nl/javascript/prototype.js,builder.js,effects.js,dragdrop.js,slider.js

Niels takes advantage of Apache’s Rewrite rules as such to make the combine PHP script transparent to the template designer:

RewriteEngine On
RewriteBase /
RewriteRule ^css/(.*\.css) /combine.php?type=css&files=$1
RewriteRule ^javascript/(.*\.js) /combine.php?type=javascript&files=$1

This is nice because it keeps the PHP script and HTML template separate from each other, just like mod_concat.
Ed Elliot’s PHP Combine Solution
Ed’s solution for combining CSS and JavaScript is less flexible from a front-end template designer’s perspective, as you’ll need to touch PHP code to update the files being merged together. However, the advantages I see to his take on the problem are:

* He masks the actual file names being combined, and
* A new version number is automatically generated to automatically bust the cache

For folks who don’t mind digging into PHP, the above benefits may be worth the effort. I especially like the cache-busting, as it allows me to put a far future expires header without worrying if my users will get the update or not.
PHPSpeedy
Finally among the PHP scripts I found is PHPSpeedy. Also available as a plug-in for WordPress, PHPSpeedy appears to get the job done like the others, with the added benefit of automatic minification. This might be useful for folks, but I’m the obfuscator type and promote that for production build processes. I’d love to see a safe obfuscator like YUICompressor written in C so we could turn it into a module for Apache.
Lighthttpd and mod_magnet
For users of Lighthttpd, mod_magnet can be used to do the concatenation. It appears similar in nature to Rakaz’s solution, though I will leave it to you to dig in further as it seems to be fairly involved. This blog post by Christian Winther should help get you started.
ASP.Net Combiner Control
Cozi has developed an ASP.net control to combine multiple JS and CSS into a single file, and includes a cool versioning feature much like Ed Elliot’s script. It’s very easy to use; you simply wrap the script with the control tag in the template:

It then outputs the following code at runtime:

The only problem I see with their approach is that since the output file has query parameters, Safari and Opera won’t honor cache control headers as it assumes it is a dynamic file. This is why simply adding ?ver=123 to bust the cache is not a good idea for those browsers.
Java JSP Taglib – pack:tag
Daniel Galán y Martins developed a combine solution for Java called packtag. It follows in the spirit of PHPSpeedy and provides additional optimizations such as minification, GZIP, and caching. It’s not obvious from the documentation what the output of the combined script looks like, but in a flow graphic it seems to include a version number, which would be cool. The code to do the combination goes right in the JSP template, and looks like this:

/js/validation.js
/js/tracking.js
/js/edges.js

CSS can be combined too. The syntax appears to be quite flexible:

/main.css
../logout/logout.css
/css/**
http://www.example.com/css/browserfixes.css
/WEB-INF/css/hidden.css

As you can see this idea has been implemented in many languages, some with additional innovations worth considering, so if you can’t leverage mod_concat, at least use something similar as the benefits are well worth it.
Final Thoughts
mod_concat is a performant, cross-language, high-scale way to build concatenation into your build process while maintaining files separately. While it lacks automatic versioning (Ian, can we do this?), it provides a clean way to dynamically merge JS and CSS together without touching a bit of server-side code, and it works across server-side languages. One feature I’d like to see added is a debug mode. For example, if the code throws an error it may not be apparent based on line number what file is having issues. Perhaps the filename could be included in comments at the start. Remember, improving the time to start rendering the page is critical and you should focus on this first. With tools like mod_concat and the others mentioned here, there should be little excuse to implement this into your routine. Little pain, a lot to gain.

Automatic merging and versioning of CSS/JS files with PHP

Introduction

Most sites include a number of CSS and JavaScript files. Whilst developing it's usually easier to manage them as separate files but on a live site it makes sense to merge files to reduce the number of HTTP requests the browser has to make. For JavaScript this is particularly important as browsers block rendering whilst downloading. It's also important to version your files to ensure that browsers download the latest copies when you've made changes.

I hate maintaining this stuff manually so I've written a PHP script which takes care of merging files on the fly whilst also versioning the merged file automatically as the various component files change. The file is merged on first request and cached. Subsequent requests are served the cached version. The script also sets HTTP headers to ensure the user's browser maintains each version in its own local cache therefore preventing repeated requests to the server. Finally an archive of the merged files is maintained to ensure that requests for old versions return the relevant CSS/JavaScript rather than the latest which might not match the user's cached HTML.
Using the script

Step 1: Start by setting the correct mime type for the files you want to merge.

1. define('FILE_TYPE', 'text/javascript');

Step 2: Modify the $aFiles array to include the paths to the files you want to merge. These should be relative to the server document root.

1. $aFiles = array(
2. 'js/yahoo.js',
3. 'js/event.js',
4. 'js/connection.js',
5. 'js/blog-search.js'
6. );

Step 3: Set the location the script should write the archive files to. When first run it will automatically create the folder you specify if it doesn't already exist. For this to work you'll need to make sure that the parent directory, in this case "js", is owned (or is writable) by the user your web server runs as.

1. define('ARCHIVE_FOLDER', 'js/archive');

Step 4: When called directly the script returns the merged code which you reference from your HTML source. For JavaScript files your HTML source should look something like this:

1.

When included via require the script returns the latest version number rather than the source. When rendered it will look like this:

1.

I've used a .htaccess file containing the following mod_rewrite rules to map this filename to the script.

1. RewriteEngine On
2. RewriteBase /
3. RewriteRule js/site_([0-9]+).js js/combine.php?version=$1 [L]

If your host doesn't support .htaccess files you can rewrite your code to:

1.

That's it for the set up. When you make changes to your source files the script will now take care of updating both the code served and the corresponding filename in the HTML source.
Caveats

If you subsequently add files to the script which have older last-modified dates than those already included they won't trigger a new version. I could have added code to support this but it would have significantly increased the complexity of the script. To trigger a new version simply touch or re-save one of the files.
Thanks

10 Free Tools to Check Website Loading Time

Everyone knows how annoying it can be having to deal with websites that take forever to load. As per some recent researches, almost 75% of the Internet users do not return to sites that take longer than four seconds to load. A Fast loading website is the first step to a successful online presence, but you will be surprised how many scripts and widgets may be slowing your site to a crawl.
10 Free Tools to Check the Website Loading Time

1. iWebTool Speed Test
- Simple tool to test your website’s loading time and compare with other websites. great tool for benchmarking. It allows you to enter up to 10 websites, and the results display the size of the website, the total loading time and the average speed per KB.

iwebtool speed test

2. Pingdom Tools
- Pingdom is a popular uptime performance monitoring service for websites and servers. They also host a free load time test for web pages. The Full Page Test loads a complete HTML page including all objects (images, CSS, JavaScripts, RSS, Flash and frames/iframes) and displays the load time of all objects visually with time bars. Also see the statistics like the total number of objects, total load time, and size including all objects.

Pingdom load test

3. Internet Supervision Webserver Monitoring Tool
- InternetSupervision.com monitors the availability, performance, and content of your website, web server and internet services from across the globe. If you ever wanted to know how much time it takes to load your website from different locations in the world, this is the tool for you.

Internet Supervision website tool

4. Webslug Loading Time Test
- Webslug measures load time as the user sees it. The time it takes for a page to load fully from when the request was made. The main benefit of Webslug is that it doesn’t require any download or any program to be installed. Just enter your website’s address and it’s all done in your browser.

webslug loading time comparison

5. OctaGate Site Timer
- Very similar to Pingdom Tools

Octagate site timer

6. Site-Perf.com
- Though this looks similar to Pingdom tool, it goes a step further by letting you decide the test server location and Max threads per host value. Very good and very accurate!

site-perf analysis
7. LinkVendor Website Speed Check

- The website speedtester shows the duration of a given website. This value can be used for showing how long a website take to load and if it is better to optimize the website or change a (slow) ISP.

linkvendor speed test
8. Website Optimization – Web Page analyzer

- The script calculates the size of individual elements and sums up each type of web page component. Based on these page characteristics the script then offers advice on how to improve page load time. The script incorporates the latest best practices from Website Optimization Secrets, web page size guidelines and trends, and web site optimization techniques into its recommendations.

website optimization tool
9. Uptrends Web Page test tool

- The full page test tool allows you to test the load time and speed of a complete HTML page of your website, including all objects such as images, frames, CSS stylesheets, Flash objects, RSS feeds, and Javascript files. The full HTML page test tool will analyze the page and download all the objects, displaying the corresponding load times, the object sizes, and which objects are missing, including content from third party suppliers such as advertisements. With the full HTML page test tool you will be able to analyze in detail which object slows down your web page, and how to optimize your website. The load time of all objects is visualized with time bars.

Uptrends page load tool
10. WebWait

- WebWait is a website timer. You can benchmark your website or test the speed of your web connection. Timing is accurate because WebWait pulls down the entire website into your browser, so it takes into account Ajax/Javascript processing and image loading which other tools ignore.

webwait loading time test

Why MySQL could be slow with large tables ?

If you've been reading enough database related forums, mailing lists or blogs you probably heard complains about MySQL being unable to handle more than 1.000.000 (or select any other number) rows by some of the users. On other hand it is well known with customers like Google, Yahoo, LiveJournal,Technocarati MySQL has installations with many billions of rows and delivers great performance. What could be the reason ?

The reason is normally table design and understanding inner works of MySQL. If you design your data wisely considering what MySQL can do and what it can't you will get great perfomance if not, you might become upset and become one of thouse bloggers. Note - any database management system is different in some respect and what works well for Oracle,MS SQL, PostgreSQL may not work well for MySQL and other way around. Even storage engines have very important differences which can affect performance dramatically.

The three main issues you should be concerned if you're dealing with very large data sets are Buffers, Indexes and Joins.

Buffers
First thing you need to take into account is the fact - situation when data fits in memory and when it does not are very different. If you started from in-memory data size and expect gradual performance decrease as database size grows you may be surprised by serve drop in performance. This especially apples to index lookus and joins which we cover later. As everything usually slows down a lot once it does not fit in memory the good solution is to make sure your data fits in memory as good as possible. This could be done by data partitioning (ie old and rarely accessed data stored in different servers), multi-server partitioning to use combined memory and a lot of other technics which I should cover at some later time.

So you understand how much having data in memory changed things here is small example with numbers. If you have your data fully in memory you could perform over 300.000 of random lookups per second from single thread depending on system and table structure. Now if you data fully on disk (both data and index) you would need 2+ IOs to retrieve the row which means you get about 100 rows/sec. Note multiple drives do not really help a lot as we're speaking about single thread/query here. So difference is 3.000 times! It might be a bit too much as there are few completely uncached workloads but 100+ times difference is quite frequent.

Indexes
What everyone knows about indexes is the fact they are good to speed up accesses to database. Some people would also remember if indexes are helpful or not depends on index selectivity - how large proportion of rows matches to particular index value or range. What is often forgotten about is - depending if workload is cached or not different selectivity might show benefit from using indexes. In fact even MySQL optimizer currently does not take it into account. For In memory workload index accesses might be faster even if 50% of rows are accessed, while for disk IO bound accessess we might be better of doing full table scan even if only few percent or rows are accessed.

Lets do some computations again. Consider table which has 100 byte rows. With decent SCSI drive we can get 100MB/sec read speed which gives us about 1.000.000 rows per second for fully sequential access, jam packed rows - quite possible scenario for MyISAM tables. Now if we take the same hard drive for fully IO bound workload it will be able to provide just 100 row lookups by index pr second. The difference is 10.000 times for our worse case scenario. It might be not that bad in practice but again it is not hard to reach 100 times difference.

Here is little illustration I've created the table with over 30 millions of rows. "val" column in this table has 10000 distinct value, so range 1..100 selects about 1% of the table. The times for full table scan vs range scan by index:
PLAIN TEXT
SQL:

1.
mysql> SELECT count(pad) FROM large;
2.
+------------+
3.
| count(pad) |
4.
+------------+
5.
| 31457280 |
6.
+------------+
7.
1 row IN SET (4 min 58.63 sec)
8.

9.
mysql> SELECT count(pad) FROM large WHERE val BETWEEN 1 AND 100;
10.
+------------+
11.
| count(pad) |
12.
+------------+
13.
| 314008 |
14.
+------------+
15.
1 row IN SET (29 min 53.01 sec)

Also remember - not all indexes are created equal. Some indexes may be placed in sorted way or pages placed in random places - this may affect index scan/range scan speed dramatically. The rows referenced by indexes also could be located sequentially or require radom IO if index ranges are scanned. There are also clustered keys in Innodb which combine index access with data access, saving you IO for completely disk bound workloads.

There are certain optimizations in works which would improve performance of index accesses/index scans. For example retrieving index values first and then accessing rows in sorted order can be a lot of help for big scans. This will reduce the gap but I doubt it will be closed.

Joins
Joins are used to compose the complex object which was previously normalized to several tables or perform complex queries finding relationships between objects. Normalized structure and a lot of joins is right way to design your database as textbooks teach you... but when dealing with large data sets it could be recepie to disaster. The problem is not the data size - normalized data normally becomes smaller, but dramatically increased number of index lookups which could be random accesses. This problem exists for all kinds of applications, however for OLTP applications with queries examining only few rows it is less of the problem. Data retrieval, search, DSS, business intelligence applications which need to analyze a lot of rows run aggregates etc is when this problem is the most dramatic.

Some joins are also better than others. For example if you have star join with dimention tables being small it would not slow things down too much. On other hand join of few large tables, which is completely disk bound can be very slow.

One of the reasons elevating this problem in MySQL is lack of advanced join methods at this point (the work is on a way) - MySQL can't do hash join or sort merge join - it only can do nested loops method which requires a lot of index lookups which may be random.

Here is good example. As we saw my 30mil rows (12GB) table was scanned in less than 5 minutes. Now if we would do eq join of the table to other 30mil rows table and it will be completely random. We'll need to perform 30 millions of random row reads, which gives us 300.000 seconds with 100 rows/sec rate. So we would go from 5 minutes to almost 4 days if we need to do the join. Some people assume join would be close to two full table scans (as 60mil of rows need to be read) - this is way wrong.

Do not take me as going against normalization or joins. It is great principle and should be used when possible. Just do not forget about performance implications designing the system and do not expect joins to be be free.

Finally I should mention one more MySQL limitation which requires you to be extra careful working with large data sets. In MySQL single query runs as single thread (with exeption of MySQL Cluster) and MySQL issues IO requests one by one for query execution, which means if single query execution time is your concern many hard drives and large number of CPUs will not help. Sometimes it is good idea to manually split query into several, run in parallel and aggregate result sets.

So if you're dealing with large data sets and complex queries here are few tips

Try to fit data set you're working with in memory - Processing in memory is so much faster and you have whole bunch of problems solved just doing so. Use multiple servers to host portions of data set. Store portion of data you're going to work with in temporary table etc.

Prefer full table scans to index accesses - For large data sets full table scans are often faster than range scans and other types of index lookups. Even if you look at 1% or rows or less full table scan may be faster.

Avoid joins to large tables Joining of large data sets using nested loops is very expensive. Try to avoid it. Joins to smaller tables is OK but you might want to preload them to memory before join so there is no random IO needed to populate the caches.

With proper application architecture and table design you can build applications operating with very large data sets based on MySQL.

Simple process to estimate times and costs in a web project

After my previous article about a structured process to develop a web application I received some requests from my readers which asked to me to dedicate a post about how to estimate times and costs of a web project.

In this articles I want to illustrate a simplified top-down process to estimate times and costs of a web process using a simple spreadsheet (in this example I used Google Spreadsheets but if you prefer you can use Microsoft Excel, OpenOffice Spreadsheet or a free online service such as Zoho or EditGrid).

Process main phases
In this simple top-down estimate process you can identify five main phases:

1. Define Activities
2. Define Task
3. Define Human Resources
4. Assign Human Resources to Tasks
5. Estimate times and costs

The process start with a general definition of macro-activities and with a detailed definition of tasks, human resources used, times and costs related to each task.

1. Define Activities
In this first phase you have to define the main activities which compose your project:

For example, in a generic web project you can identify the following main activities:

1. Requirements definition
2. Design
3. Implementation
4. Test
5. Release

In my spreadsheet I created a new sheet called Activityes and I added the following two columns:

A: WBS (work breakdown structure), the ID of each activity/task;
B: Activity name.

Next step is to detail each activity with a certain number of specific tasks.

2. Define Tasks
Each activity is composed from some tasks. Each task is a smaller piece of work which composes a main activity:

In the spreadsheet you can add new tasks adding new rows below related main activity. I suggest you to use a different format to highlight tasks from activities how I used in the following example:

1. Requirements definition
1.1 Define application scope
1.2 Define technical requirements

2. Design
2.1 Application Map
2.2 Database Entity relationship model
...

3. Implementation
3.1 SQL code
3.2 HTML code
3.3 CSS code
...

3. Define Human Resources
Next step is defining human resources in terms of category, seniority and hourly cost:

Each category has a specific hourly cost related to specific seniority. You can organize these information using a simple category/seniority matrix. For example if you have to estimate a big/medium size project you can identify the following categories:

- Analyst
- Programmer
- Project manager
- ...

and the following seniorities:

- Junior
- Senior
- ...

Now, define hourly cost for each category/seniority combination (in a more complex project you can also define a standard rate and an overtime rate for each combination). In the spreadsheet you can create the table above in a new sheet called Resources in the same spreadsheet. At this point you have two sheets:

A first sheet with activities and a second sheet with resources. In this way when you assign resources to tasks you can link the cost of a specific resource with a reference formula (=). This is a good practice because if you have to change the cost related to a specific combination category/seniority, you can do it only once in the sheet "Resources" and automatically all changes will be reported in all instances (task) which use that combination in the sheet "Activities"

4. Assign Human Resources to Tasks
Next step: assigning one or more resources to each task estimating the effort which a task requires. This is a very delicate activity because you have to calibrate the right combination between category and seniority of resources you want to use in your project in order to estimate correctly project times and costs.

In the spreadsheet, in the sheet "Activities" create the following three columns:

1. Num (number of resources assigned to a task)
2. Category
3. Seniority

This is the result:

You can add different resources to each task (different category or different seniority) simply adding a row below the task name (for example take a look at "Define application scope" where I added 1 analyst junior in the first row and 1 analyst senior in a new row below the task name).

5. Estimate Times and Costs
Now, for each resource, estimate the daily effort (Hours/day column), number of days (Days colum), get cost related to category/seniority combination from the sheet "Resources" using a reference formula (Hourly Cost column), and calculate Total costs:

For each task (row) Total Cost is equal to:

Total Cost = Hours/day * Hourly Cost * Days

Take a mind some task could have specific costs which are indipendent from the number of resources you assign to that task. You can add this costs adding a new column to the left of the column Total Cost called "Additional Costs".

In this case Total Cost will be equal to:

Total Cost = (Hours/day * Hourly Cost * Days) + Additional Cost

That's all. Take a look at the spreadsheet or copy it in your Google Documents account to reuse it.

Take also a look at these posts:

- Structured process to develop a web application
- Google Spreadsheets Gantt Chart (Microsoft Project-like)
- Google Spreadsheets: formulas tutorial
- Google Spreadsheets Tips: Add custom charts
- Project Management: Excel Gantt Chart Template

I hope you'll find this post useful. If you have some suggestions about this process or if you want to suggest some interesting link related to this topic please add a comment, thanks!

Structured process you must know to develop a web application

Developing a web application is a hard work which requires much time you have to spend doing a myriad of things. If you don't use a methodic approach, especially in case of a complex project, you run the risk of losing sight of the project, not respecting times of delivery and wast your time for nothing.

This post illustrates a structured process which helps you to simplify the approach to develop your web applications saving time and more efficiently.

Download The Woork Papers N1 | Structured process you must know to develop a web application

Process main phases
In a generic web application developing process you can identify five main phases:

1. Requirements definition
2. Design
3. Implementation
4. Test
5. Release

Planning and Monitoring is a "cross phase" which follows developing process defining a project plan composed from a list of activities which you have to monitor during project execution. For each activity you have to define a set of information useful for its monitoring, for example:

- owner
- duration
- costs
- ...

Take a look at these posts I wrote some time ago about how to implement a project plan with a Gantt chart using Excel or Google Spreadsheets:

How to organize a project plan
Excel Gantt chart template
Implement a project plan and manage activities with Google Spreadsheets

1. Requirements Definition
In this first phase you have to define the scope and needs of your web application in terms of what your application must do, main features and technical requirements:

Scope
In order to define the scope of your web application is sufficient to compile a detailed list with a clear description of application features. At the moment is not important "how" you'll realize them but "what" you have to realize!

Needs
Needs analysis is a crucial part of developing process. In this step you have to estimate your potential traffic, choose a server-side language (PHP, ASP, Coldfusion...), database, choose an hosting service... Place a big attention on not to overrate/underrate your estimates! Evaluate every thing with a right balance between times, costs and objectives!

2. Design
After requirements definition phase, you have to "design" your application with a clear project. In this phase you can identify the following steps:

Design: Application Map
An application map contains just meaningful and essential information about the structure of your application: pages (represented with some blocks) and main relationships between them. Your application map could be something like this:

In this way you have a map with some "locations" (pages) and a "path" (relationships between pages) which you simply have to follow in order to proceed, page-by-page, to implement your application in the next phase. In this way you'll save a lot of time, having clear in mind what you have to implement.

Design: Database
Ok, now it's time to design application database. A simple way to do that it's using a entities-relationships (ER) model. In general you can follow this order: define first tables, than attributes and relationships between tables. Your ER model will be like this:

1:1 expresses the cardinality of a relationship (in this case for example 1 user is assigned only to 1 task, 1 user live only in a city). For more information about this topic take a look at my old posts:

Define the entities-relationships model
A correct approach to define relationships between database tables
10 Useful articles about Database design

Design: Page Structure
Next step is to design an approximate structure of the page, identifying all main sections using a name (for example #header, #navbar, #mainContent, #sidebar).

Design: Server-side Language
Taking a mind an object-oriented approach for developing your application, you can defining classes, functions and all server-side features you need. Remember... that's not the "implementation" but a way to have a "guide" for that you'll implement in the next phase.

Design: JS Framework
In this step choose a JavaScript Framework (jQuery, Scriptaculous, MooTools...), than pass to identify the main features you want to implement (drag and drop, animation effects...) compiling a simple list which associates each specific feature to one or more pages identified in you application map.

A this point design phase is completed. Let's start with implementation!

3. Implementation
Ok.. now starts the real challenge because "implementation" is the realization of your application. You can divide this phase in the following steps:

Implementation: Database
Create a new database and write SQL code defining tables, attributes and relationships. In the past I dedicated some posts about this topic. Take a look at the following links for more information:

How to use PHP and SQL to create DB tables and relationships
Create tables and relationships with SQL

Implementation: HTML
Use the page structure you defined in Design phase to implement HTML code:

This is the moment to add all HTML elements you need in sections identified during Design phase. For example if the sections mainContent contains a post with a title, a text body and post tags, add these elements:

Implementation: CSS
When the main structure is ready, start to write CSS code to add styles to your application. If you need some suggestions about how to write a better CSS code take a look at these posts:

CSS coding: semantic approach in naming convention
Useful guidelines to improve CSS coding and maintainability

Implementation: Server-side language
Implement application class, application functions, DB interactions, queries, and every thing requires a server-side interaction.

Implementation: JavaScript
Implement Ajax features (drag and drop, animation effects...) using the framework you chose in Design phase (jQuery, Scriptaculous, MooTools...).

4. Test
During this phase you have to "stress" your application executing your code in various conditions (for example using different browser). Your objective is to detect all application bugs and fix them before the final release.

Remember, this process must be methodic and require a lot of patience! Test each page and each features (also in this case can help you application map to proceed in a certain order). If you find a bug during test esecution, fix it modifying the code and than proceed with the final validation (an ulterior test) of the code.

5. Release
Finally you are ready to release your application! Publish it in a test folder and make a final test. If it's all ok proceed to the final release.

Read and download this post on Scribd

Estimating time for Web Projects more accurately: Part 2

In Part 1 of this article I discussed the common reasons why underestimating times for web projects is such a common occurrence. In this part I describe how I personally go about compiling estimates in ways that reduce risk to the project, and your business, and increase the accuracy of the estimate, and thus overall profit.

Estimating projects and the occult

After reading Alyssa Gregory’s recent article on Sitepoint.com, How to Estimate Time for a Project, I couldn’t help but feel that it was a good introduction but lacked a little when compared with the realities and complexities of estimating/quoting for a website or web application.

“The Devil is in the detail: When people say that the devil is in the detail, they mean that small things in plans and schemes that are often overlooked can cause serious problems later on.” Using English.com

I may not believe in the Devil, but I do believe this statement to be as true as it gets when it comes to estimating time for web projects.

Estimating web projects accurately

Some would say this is an oxymoron, and to some extent I would agree, but I do believe by applying a few techniques it’s possible to drastically increase the accuracy of most web project estimates and avoid the feeling of wanting to curl up in the foetal position and whimper helplessly under your duvet.

Confirm a ball park figure

Before you embark on any detailed estimate exercise it’s crucial to immediately confirm with the client a rough budget, or budget range, they feel would be acceptable for an estimate. After all, if you get a new business lead, spend three days estimating and deliver a proposal worth £30,000 only to hear that the client ‘appreciates your response’ but only has £3000 to spend, you probably deserve to be struck with reasonable force in the baby-making department.

When you receive a project brief, throw some figures out there for the client to comment on, only one of a possible few things will happen:

They will laugh and hang up when they hear your daily rate
You will laugh and hang up when you hear they want Facebook for £4000
They will say you’re in the right area but probably need to come down or go up a little bit
They will refuse to feedback and say “We are open to suggestions”

For these possibilities what the next steps are need little explanation, but the “we are open to suggestions” answer is always the trickiest. In these cases it really is down to you to make a decision if the project is worth the risk of spending time writing a proposal and estimating for. It could be that the potential for new work is huge, or the client is extremely high profile and thus often it’s worth it, however, if you feel none of these are true, you should probably ask yourself what this statement says about the client and if you want to work with them.

It’s understandable a client wants to get their website or web application at the lowest cost, but the best clients are the most professional, experienced and ethical ones and simply know that not providing a rough budget range could waste theirs and your time.

Assuming you have made the decision to pursue the lead, you can now begin the detailed estimating exercise.

Consistent project phase breakdowns

Most web projects can be broken down into the following high-level categories:

Research and planning
Solution design
Functional specification
Web design
Front-end development
Back-end development
Content entry

Each individual project will contain unique tasks, but most can be encapsulated in the above phases.

The first step to increasing accuracy of web project estimates is to make sure you always begin with a consistent set of categories and then add as many sub-categories and tasks as you like.

Get granular, then get more granular

Now your consistent high-level project phases are defined, it’s time to get granular and add sub-phases and tasks e.g.:

Research and planning
- Requirements gathering
- Project planning
Solution design
- Sitemap
- Wireframes
- User workflows
- Functional specification
Design
- Initial homepage look and feel
- Content page
- Master content page template
- News main page
- News item
Front-end development
- Template x5 XHTML/CSS
- Cross-browser fixes
Back-end development
- CMS Setup and configuration
- News feature
- Contact us form
Content entry

This full list of required project tasks can be created based on the pre-sales research you have conducted with the client. It is imperative to nail down, in as much detail as possible, what exactly the client wants before you submit an estimate or begin work.

The more granular you can get, the more you are forced at this early stage to think through each part of the project, literally designing and building the website or application from beginning to end in your head.

By going through the project step-by-step, putting yourself in the shoes of the information architect, the designer and all developers, will often immediately result in many issues rearing their head that you need to clarify before putting in an estimate, take the News feature for example. Ideally you could break each feature down as much as you can, so the News feature may actually end up looking like the following:

News feature
- Add/edit/delete new item
- Upload image
- Attach PDF
- Auto-archiving
- RSS

By resisting the temptation to just think “News… ummm… 5 hours” and breaking it down to this level means you’re mentally building the feature step-by-step and raising questions as you go.

So the client needs to be able to upload images to their news items, ok, but do they need:

Auto-resize capability?
Auto-thumbnail generation?
Full-screen viewing?
Caption addition facility?

I’m sure you can think of many more questions that could be associated with a simple upload image for a news item requirement. This demonstrates the possible scope variations that are contained within even the smallest of features and that could impact your estimates / risk of underestimating.

By getting granular and mentally trying to build the solution you are able to identify and address these issues early on, making sure to cater for them in your final estimate.

“A Web Project Manager knows how to design and develop most of the project on his own, even if with poorer results compared to his team. This allows him to estimate projects with good approximation and to understand his team’s problems and difficulties” Introduction to Web Project Management: Fucina Web

Granularity: Good for the client and good for you

Don’t forget, by getting this granular not only increases your estimations accuracy, but it also gives you the instant ability to remove proposed features quickly if your estimate exceeds the client’s maximum budget! Need to shave 10 hours off the budget? Well, rather than removing the News feature entirely, how about remove the thumbnail and caption adding functionality from News, and a few other small niceties from other features and still allow the client to have the basic versions of all the features they need? Simply remove the lines and hey presto, a new estimate at warp speed and with full visibility to the client of what functionality they’re sacrificing for budget.

Roughly guessing the News feature will take 5 hours is one thing, but what about when the client comes to you after seeing the initial functional specification and asks where the archive section is or how they attach PDFs simply because they assumed the News section they would be getting is like the News section on another website they’d seen?

Getting granular will not always allow you to breakout a required feature in full because it is client industry specific. If you are ever in any doubts about what the client needs, ASK THEM! Take the time to understand their business so that you can fully understand why they want the feature and how it needs to work.

Few clients mind you asking questions, if anything it tends to give them more confidence that you will continue to be as diligent and thorough if they hire you.

Best of all, if you win the work, by getting granular you have:

An instant statement of work
A defined project scope
The timings you need to put together an accurate project schedule with milestones
Set your client’s expectations very early
Demonstrated your thoroughness and understanding of their business to the client

By being methodical, transparent and getting granular from the very start means your client is well informed and you’re seriously reducing the chances of friction later down the line.

You won the work w00t! Time to start tracking time

The more projects you complete that are categorised with a consistent set of phases and tasks, the more useful data you can collect on how long you estimated versus how long each phase or task actually took.

Consistency here is the key again. By this stage you should have the final project estimate that was approved by the client, and this estimate should be broken down first into the same high-level phases as your previous projects, and then at a granular level by task and sub-task.

Simply replicate this structure and the time estimates for each in your time tracking tool of choice so that you can:

See how long you have to complete each phase
View how long you have for each task and sub-task
At the end, report on how long everything took

Not only is this a perfect way to track the progress of the project you are working on, but the data you collect over multiple projects, using a consistent pattern, will become more and more valuable when it comes to estimating future projects and also allow you to identify bottlenecks in your project processes.

Analyse project estimates vs. actual time spent

Once you have defined a consistent set of high-level project phases to use when estimating all web projects, and committed to setting up your time tracking tool each time to match, you are ready to reap the rewards.

By tracking all the estimated and actual times of all past projects, using a consistent framework, will give you an average percentage that each phase and task took, and you can use these averages as a good guide for starting a new estimate.

For example, by collecting data for your past five projects you are able to identify trends like:

Solution design took around 10% of the total project time to complete and get approved
Web design 30%
Front-end development 15%
Back-end development 30%

And so on…

Using these averages, and with a client’s preferred budget, you can begin to immediately allocate a rough amount of time to each phase and then break out each in more detail, but with the roughly allowed hours available known beforehand.

With this initial capped time limit in place, you can estimate not only your times but also the most cost-efficient solution you can deliver based on the budget rather than going in with a blind quote that may be way to small or too high.

Of course, this is dependent on knowing the client’s budget beforehand, not always possible, but more possible than most seem to think if you just explain you want to deliver the best solution for the budget given that you could potentially offer a News feature that costs £750 or £7,500.

Get granular with features, again!

Aside from being able to determine the average percentage of time each high-level phase usually takes of each project, you can take this tracking and analysis one step further.

How many websites or web applications tend to need a feature you’ve implemented before? Small to medium business websites invariably demand the following functionalities:

News
Press Releases
Case Studies
Events
FAQ
Contact us form

At the estimating stage you will have identified these requirements and broken them down as much as you can. Not only can you re-use these common breakdowns in multiple quotes, if you track the time for each one over multiple projects you will also begin to have average times it takes to implement or migrate each feature – now that’s useful.

In summary

The approach for estimating time for web projects I have described to you in this article works for me. It’s something I’ve developed over several years through trial and error by having to estimate web projects of all shapes and sizes in a small web agency environment, combined with a lot of good old fashioned hours of research.

Its methodical, it’s laborious, isn’t a guarantee that a project will not go over budget and I’ve no doubt is something I will continue to develop and evolve as time goes by. But as any web project manager will tell you, the more you plan, the more chance there is your project will be successful and the same goes for estimating times.

When possible, resisting the temptation to throw some figures onto a proposal and send it off and instead splitting the project requirements into as granular detail as possible can really be a life saver. It not only identifies possible grey areas early on but forces you to think through everything you’re planning on offering the client and to what extent/scope.

It also presents you to the client as someone, or as an agency, who are very meticulous, diligent and thorough in how they approach things and this is often taken as a signifier of how you will approach the rest of the project and this always gives the client confidence.

Finally, by creating and maintaining a consistent pattern of high-level phases and tasks between your web project time estimates and time tracking means you can collect and analyse reliable data from multiple projects that can help you further increase the accuracy of, and cut the time needed to create, estimates for web projects.

If you want to read why underestimating web projects is such a common occurrence, please check out Part 1 of this article »

Estimating time for Web Projects more accurately: Part 1

A good foundation

But before we look at how to create more accurate estimates for web projects, it’s important to look at the reasons why this area is such a difficult one and why web project management forums aren’t exactly littered with posts entitled “How can I stop completing web projects on budget?”

Why underestimating is so common

There are many common reasons why web projects are often underestimated by freelancers and web agencies alike, they include the following:

The technologies required have never been used before
Large parts of the project are grey areas at the time of estimating
The features needed are very specific to the client’s industry and thus bespoke
To break the project down into the detail it would require almost as much as work as the paid for requirements gathering phase

While almost all web folk will admit to these points being common reasons why it’s difficult to estimate time for web projects, there are a few that will admit to the following that are just as true:

No previous project ‘estimated vs. actual’ analysis has been conducted to draw on
The client needs an estimate for their large project tomorrow
The revenue needs for immediate cash flow now outweigh the effects of no new business now
Estimating time for a project is not fun

I am no different. At one point or the other I have cited the valid public reasons but also fallen foul of the not so public ones; this is the reality of working as an insanely busy freelancer, or in an equally busy web agency, just trying to keep your head above the water financially.

However, there are few worse feelings in this industry than getting to any point in a project and realising you’ve grossly underestimated the time, it’s a bitter taste you don’t forget in a hurry and should strive to avoid again.

So why does it happen time and time again?

The day-to-day reality

The approaches to creating more accurate estimates for web projects that I will describe in Part 2 of this article explain how to combat several of the common reasons for underestimating. I personally use them to track time on all projects and use the data gathered when estimating new ones – the results are very positive indeed.

However, as you can see below, it doesn’t resolve them all:

The technologies required have never been used before
Large parts of the project are grey areas at the time of estimating
The features needed are very specific to the client’s industry and thus bespoke
To break the project down into the detail it would require almost as much as work as the paid for requirements gathering phase
No previous project ‘estimated vs. actual’ analysis has been conducted to draw on
The client needs an estimate for their large project tomorrow
The revenue needs for immediate cash flow now outweigh the effects of no new business now
Estimating time for a project is not fun

So what can you do to combat each of these realities?

Technologies have never been used before

If you are estimating time needed to implement a solution using a technology you have little or no experience of, you will have to conduct some basic research and then just guess! Alternatively, try to negotiate a smaller fee with which you can conduct the first stages of the project in terms of research and putting together some kind of high-level specification for the project.

At best, you learn a new technology, build a rapport with the client and win the remaining work. At worst, you’ve learnt a new technology, you’ve generated revenue and the client has a good specification to update its project tender with.

Detailed estimation exercises take too long

My personal approach takes time, much more than the traditional method of providing an estimate purely based on both experience and instinct. What if you spend hours or days putting together this detailed quote, breaking the project down perfectly, and then don’t win the work. Is that a loss of opportunity on another potential sale? Does the client now have a free detailed breakdown for free?

Well yes, but this is just how it goes in any sales situation. Is it wasted work? Possibly, but it’s also possible you have researched a new technology or broken down a feature you haven’t used before. What’s to stop you trying to sell this to existing or new clients if it aligns with their long-term business aims?

Client needs estimate for large project tomorrow

If a client needs an estimate for a large project tomorrow, please refer to the ball park figure section in Part 2 of this article. Try to confirm a ball park figure and then assess the potential gains to your business versus the possibility of underestimating this one project. Always bear in mind, the larger the project, the greater the likelihood you will underestimate and the potential loss you could incur by submitting a low estimate.

Business revenue needs

Revenue needs of a freelancer or small businesses for cash flow is an ever present issue. A typical dilemma faced is:

We need £20k in revenue this month to sustain the business
Client is offering us a maximum of £10k now and we don’t have many other leads
We feel their project will cost a minimum of £15k
We can walk away now and not lose £7k, but also not make £10k
We can take the project on and try and do it for £10k
We can take the project on; accept the £5k loss, break even and live to fight another day

This quandary is the reality of running a business and just one of the daily tough decisions a freelancer or Managing Director has to make, I don’t envy them. What to do in this situation is entirely the business owners decision, all you can do is get accurate information to them so that they can make as informed a decision as possible.

Estimating time for a project is not fun

Well never has a truer word been spoken. Despite taking so much pride in it, and also testing new ways to improve accuracy, it’s not often a person will love the process of compiling and delivering a project estimate (if you find one, grab hold of them and lock them in your office!).

Let us speak plainly my friends; the following facts are true when it comes to estimating web projects:

It’s hard work and takes you outside your comfort zone
Forces you to predict the future
Usually has to be completed alongside your plans for your already fully booked week
Makes you largely responsible for the:
- Sales success
- Solution offered
- Eventual profitability of the project
- Growth/survival of your business

With all of the above being truisms that I believe most people who have to estimate web project feel, the scary truth of the matter is getting it wrong just a few times can leave a freelancer or small web agency in real turmoil.

Identify then fix

For all of the reasons stated above, getting people who are primarily from a creative or technical background and more passionate in actually creating something, to really spend time on developing a solid and reliable web project time estimating process is quite difficult and understandably so, it’s just not sexy work.

But, especially in these turbulent times, being able to identify any gaps in your entire web project management workflow that (incoming buzzword bingo phrases) minimise project risk and maximise project profit should take a front seat, and the web project estimation phase is the most crucial in achieving these aims.

In Part 2 of this article I will go into detail of how I personally go about estimating time for web projects, both on an individual project and long-term multiple project basis, that combats the remaining common reasons for underestimating.

Project Estimate for website Redesign

Whether you bill clients hourly or on a per project basis, a necessary step of all projects is estimating the time it will take. Not only does the client want to have an idea of how much money they will be spending, but they also need to plan around an estimated timeline. And you need to be able to ensure you have the time and resources necessary to complete the project.

Depending on a number of factors, including how much experience you have with the type of work you’re doing, if you are using subcontractors, and the information you have from the client, estimating the time for a project can be difficult. Here is the process I use when scoping the time commitment for a new project.

Identify Deliverables

The first step is to identify the main project (i.e. Website Redesign), and then pinpoint the specific deliverables associated with the project. For example, upon completion of the redesign, you will be providing the client with a newly designed website by FTPing the site files and sending the client a CD or USB drive with the working files.

Break It Down

Next, I take the project and break it down into simple tasks separated by component – the more specific the better – that will get us to the deliverables. Here is an example of what the tasks may look like:

Project Planning

Initial meeting with client to gauge scope of project
Provide client with project information sheet to get more information about what they like/don’t like about their existing site
Review/analyze existing site and client form
Develop list of areas site changes to be made
Get approval from client

Design

Design site mockup
Get approval from client
Code pages
Create new navigation
Reorganize content into new pages
Optimize for SEs

Testing

Cross-browser testing
Validate code
Check links
Test forms

Add It Up

The next step is to estimate time for each task, rounding up. If you are using subcontractors, you will need to get their time estimates first and work them into your time. Then take the total time for all of the tasks and add in a buffer. The buffer can be anything, although I usually stick with a 10-25% addition. This allows for any unexpected situations or challenges that arise.

Things to Keep in Mind

The more time estimates you do, the more accurate you will be. As you create your own formula, some other factors you may want to consider include:

Project management time
Time to review work of subcontractors
Holidays or days off that occur during the project
Client turnaround time
Debugging

I hope this will help you somehow.

Maneesh Thareja