Sunday, August 24, 2008

Web Site Optimization: 13 Simple Steps

Earlier this year, Steve Souders from the Yahoo! Performance team published a series of front-end performance optimization "rules" for optimizing a page.

This tutorial takes a practical, example-based approach to implementing those rules. It's targeted towards web developers with a small budget, who are most likely using shared hosting, and working under the various restrictions that come with such a setup. Shared hosts make it harder to play with Apache configuration -- sometimes it's even impossible -- so we'll take a look at what you can do, given certain common restrictions, and assuming your host runs PHP and Apache.

The tutorial is divided into four parts:

basic optimization rules

optimizing assets (images, scripts, and styles)

optimizations specific to scripts

optimizations specific to styles

Credits and Suggested Reading

The article is not going to explain Yahoo!'s performance rules in detail, so you'd do well to read through them on your own for a better understanding of their importance, the reasoning behind the rules, and how they came to be. Here's the list of rules in question:

Make fewer HTTP requests

Use a Content Delivery Network

Add an Expires header

Gzip components

Put CSS at the top

Move scripts to the bottom

Avoid CSS expressions

Make JavaScript and CSS external

Reduce DNS lookups

Minify JavaScript

Avoid redirects

Remove duplicate scripts

Configure ETags

You can read about these rules on the Yahoo! Developer Network site. You can also check out the book "High Performance Web Sites" by Steve Souders, and the performance research articles on the YUI blog by Tenni Theurer.

Basic Optimization Rules

Decrease Download Sizes

Decreasing download sizes isn't even in Yahoo!'s list of rules -- probably because it's so obvious. However I don't think it hurts to reiterate the point -- let's call it Rule #0.

When we look at a simple web page we see:

some HTML code

different page components (assets) referenced by the HTML

The assets are images, scripts, styles, and perhaps some external media such as Flash movies or Java applets (remember those?). So, when it comes to download sizes, you should aim to have all the assets as lightweight as possible -- advice which also extends to the page's HTML content. Creating lean HTML code often means using better (semantic) markup, which also overlaps with the SEO (search engine optimization) efforts that are a necessary part of the site creation process. As most professional web developers know, a key characteristic of good markup is that it only describes the content, not the presentation of the page (no layout tables!). Any layout or presentational elements should be moved to CSS.

Here's an example of a good approach to HTML markup for a navigation menu:

<ul id="menu">

<li><a href="home.html">Home</a></li>

<li><a href="about.html">About</a></li>

<li><a href="contact.html">Contact</a></li>

</ul>

This sort of markup should provide "hooks" to allow for the effective use of CSS and make the menu look however you want it to -- whether that means adding fancy bullets, borders, or rollovers, or placing the menu items into a horizontal menu. The markup is minimal, which means there are fewer bytes to download; it's semantic, meaning it describes the content (a navigation menu is a list of links); and finally, being minimal, it also gives you an SEO advantage: it's generally agreed that search engines prefer a higher content-to-markup ratio in the pages that they index.

Once you're sure your markup is lightweight and semantic, you should go through your assets and make sure they are also of minimal size. For example, check whether it's possible to compress images more without losing too much quality, or to choose a different file format that gives you better compression. Tools such as PNGOUT and pngcrush are a good place to start.

Make Fewer HTTP Requests

Making fewer HTTP requests turns out to be the most important optimization technique, with the biggest impact. If your time is limited, and you can only complete one optimization task, pick this one. HTTP requests are generally the most "expensive" activity that the browser performs while displaying your page. Therefore, you should ensure that your page makes as few requests as possible.

How you can go about that, while maintaining the richness of your pages?

Combine scripts and style sheets: Do you have a few <script> tags in your head? Well, merge the .js files into one and save your visitors some round trips; then do the same with the CSS files.

Use image sprites: This technique allows you to combine several images into one and use CSS to show only the part of the image that's needed. When you combine five or ten images into a single file, already you're making a huge saving in the request/response overhead.

Avoid redirects: a redirect adds another client-server round trip, so instead of processing your page immediately after receiving the initial response, the browser will have to make another request and wait for the second response.

Avoid frames: if you use frames, the browser has to request at least three HTML pages, instead of just one -- those of the frameset as well as each of the frames.

You've got the basics now. In summary, make your page and its assets smaller in size, and use fewer assets by combining them wherever you can. If you concentrate on this aspect of optimization only, you and your visitors will notice a significant improvement.

Now let's explore some of the Yahoo! recommendations in more detail, and see what other optimizations can be made to improve performance.

Optimizing Assets

Use a Content Delivery Network

A Content Delivery Network (CDN) is a network of servers in different geographical locations. Each server has a copy of a site's files. When a visitor to your site requests a file, the file is delivered from the nearest server (or the one that's experiencing the lightest load at the time).

This setup can have a significant impact on your page's overall performance, but unfortunately, using a CDN can be pricey. As such, it's probably not something you'd do for a personal blog, but it may be useful when a client asks you to build a site that's likely to experience high volumes of traffic. Some of the most widely known CDN providers are Akamai and Amazon, through its S3 service.

There are some non-profit CDNs in the market; check the CDN Wikipedia article to see if your project might qualify to use one of them. For example, one free non-profit peer-to-peer CDN is Coral CDN, which is extremely easy to integrate with your site. For this CDN, you take a URL and append "nyud.net" to the hostname. Here's an example:

http://example.org/logo.png

becomes:

http://example.org.nyud.net/logo.png

Host Assets on Different Domains but Reduce DNS Lookups

After your visitor's browser has downloaded the HTML for a page and figured out that a number of components are also needed, it begins downloading those components. Browsers restrict the number of simultaneous downloads that can take place; as per the HTTP/1.1 specification, the limit is two assets per domain.

Because this restriction exists on a per-domain basis, you can use several domains (or simply use subdomains) to host your assets, thus increasing the number of parallel downloads. Most shared hosts will allow you to create subdomains. Even if your host places a limit on the number of subdomains you can create (some restrict you to a maximum of five), it's not that important, as you won't need to utilize too many subdomains to see some noticeable performance improvements.

However, as Rule #9 states, you should also reduce the number of DNS lookups, because these can also be expensive. For every domain or subdomain that hosts a page asset, the browser will need to make a DNS lookup. So the more domains you have, the more your site will be slowed down by DNS lookups. Yahoo!'s research suggests that two to four domains is an optimal number, but you can decide for yourself what's best for your site.

As a general guideline, I'd suggest you use one domain to host HTML pages and two other domains for your assets. Here's an example:

www.sitepoint.com - hosts only HTML (and maybe content images)

i1.sitepoint.com - hosts JS, CSS, and some images

i2.sitepoint.com - hosts most of the site's images

Different hosting providers will probably offer different interfaces for creating subdomains, and ideally they should provide you with an option to specify the directory that holds the files for the subdomain. For example, if your canonical domain is www.sitepoint.com, and it points to /home/sitepoint/htdocs, ideally you should be able to create the subdomain i1.sitepoint.com (either via an administration control panel or by creating a symbolic link in the file system) and point it to the same folder, /home/sitepoint/htdocs. This way, you can keep all files in the same location, just as they are in your development environment, but reference them using a subdomain.

However, some hosts may prevent you from creating subdomains, or may restrict your ability to point to particular locations on the file system. In such cases, your only real options is to physically copy the assets to the new location. Don't be tempted to create some kind of redirect in this case -- it will only make things worse, as it creates two requests for each image.

If your hosting provider doesn't allow subdomains at all, you always have the option of buying more domains and using them purely to host assets -- after all, that's what a lot of big sites do. Yahoo! uses the domain yimg.com, Amazon has images-amazon.com, and SitePoint has sitepointstatic.com. If you own several sites, or manage the hosting of your client's sites, you might consider buying two domains, such as yourdomain-i1.com and yourdomain-i2.com, and using them to host the components for all the sites you maintain.

Place Assets on a Cookie-free Domain

If you set a lot of cookies, the request headers for your pages will increase in size, since those cookies are sent with each request. Additionally, your assets probably don't use the cookies, so all of this information could be repeatedly sent to the client for no reason. Sometimes, those headers may even be bigger than the size of the asset requested -- these are extreme cases of course, but it happens. Consider downloading those small icons or smilies that are less than half a kB, and requesting them with 1kB worth of HTTP headers.

If you use subdomains to host your assets, you need to make sure that the cookies you set are for your canonical domain name (e.g. www.example.org) and not for the top-level domain name (e.g. example.org). This way, your asset subdomains will be cookie-free. If you're attempting to improve the performance of an existing site, and you've already set your cookies on the top-level domain, you could consider the option of hosting assets on new domains, rather than subdomains.

Split the Assets Among Domains

It's completely up to you which assets you decide to host on i1.example.org and which you decide to host on i2.example.org -- there's no clear directive on this point. Just make sure you don't randomize the domain on each request, as this will cause the same assets to be downloaded twice -- once from i1 and once from i2.

You could aim to split your assets evenly by file size, or by some other criterion that makes sense for your pages. You may also choose to put all content images (those that are included in your HTML with <img /> tags) on i1 and all layout images (those referenced by CSS's background-image:url()) on i2, although in some cases this solution may not be optimal. In such cases, the browser will download and process the CSS files and then, depending on which rules need to be applied, will selectively download only images that are needed by the style sheet. The result is that the images referenced by CSS may not download immediately, so the load on your asset servers may not be balanced.

The best way to decide on splitting assets is by experimentation; you can use Firebug's Net panel to monitor the sequence in which assets download, then decide how you should spread components across domains in order to speed up the download process.

Configure DNS Lookups on Forums and Blogs

Since you should aim to have no more than four DNS lookups per page, it may be tricky to integrate third-party content such as Flickr images or ads that are hosted on a third-party server. Also, hotlinking images (by placing on your page an <img /> tag whose src attribute points to a file on another person's server) not only steals bandwidth from the other site, but also harms your own page's performance, causing an extra DNS lookup.

If your site contains user-generated content (as do forums, for example), you can't easily prevent multiple DNS lookups, since users could potentially post images located anywhere on the Web. You could write a script that copies each image from a user's post to your server, but that approach can get fairly complicated.

Aim for the low-hanging fruit. For example, in the phpBB forum software, you can configure whether users need to hotlink their avatar images or upload them to your server. In this case, uploaded avatars will result in better performance for your site.

Use the Expires Header

For best performance, your static assets should be exactly that: static. This means that there should be no dynamically generated scripts or styles, or <img> tags pointing to scripts that generate dynamic images. If you had such a need -- for example, you wanted to generate a graphic containing your visitor's username -- the dynamic generation could be taken "offline" and the result cached as a static image. In this example, you could generate the image once, when the member signs up. You could then store the image on the file system, and write the path to the image in your database. An alternative approach might involve scheduling an automated process (a cron job, in UNIX) that generates dynamic components and saves them as static files.

Having assets that are entirely static allows you to set the Expires header for those files to a date that is far in the future, so that when an asset is downloaded once, it's cached by the browser and never requested again (or at least not for a very long time, as we'll see in a moment).

Setting the Expires header in Apache is easy: add an .htaccess file that contains the following directives to the root folder of your i1 and i2 subdomains:

ExpiresActive On

ExpiresDefault "modification plus 10 years"

The first of these directives enables the generation of the Expires header. The second sets the expiration date to 10 years after the file's modification date, which translates to 10 years after you copied the file to the server. You could also use the setting "access plus 10 years", which will expire the file 10 years after the user requests the file for the first time.

If you want, you can even set an expiration date per file type:

ExpiresActive On

ExpiresByType application/x-javascript "modification plus 2 years"

ExpiresByType text/css "modification plus 5 years"

For more information, check the Apache documentation on mod_expires.

Name Assets

The problem with the technique that we just looked at (setting the Expires header to a date that's far into the future) occurs when you want to modify an asset on that page, such as an image. If you just upload the changed image to your web server, new visitors will receive the updated image, but repeat visitors won't. They'll see the old cached version, since you've already instructed their browser never to ask for this image again.

The solution is to modify the asset's name -- but it comes with some maintenance hurdles. For example, if you have a few CSS definitions pointing to img.png, and you modify the image and rename it to img2.png, you'll have to locate all the points in your style sheets at which the file has been referenced, and update those as well. For bigger projects, you might consider writing a tool to do this for you automatically.

You'll need to come up with a naming convention to use when naming your assets. For example, you might:

Append an epoch timestamp to the file name, e.g. img_1185403733.png.

Use the version number from your source control system (cvs or svn for example), e.g. img_1.1.png.

Manually increment a number in the file name (e.g. when you see a file named img1.png, simply save the modified image as img2.png).

There's no one right answer here -- your decision will be depend on your personal preference, the specifics of your pages, the size of the project and your team, and so on.

If you use CVS, here's a little PHP function that can help you extract the version from a file stored in CVS:

function getVersion($file) {

$cmd = 'cvs log -h %s';

$cmd = sprintf($cmd, $file);

exec($cmd, $res);

$version = trim(str_replace('head: ', '', $res[3]));

return $version;

}

// example use

$file = 'img.png';

$new_file = 'img_' . getVersion($file) . '.png';

Serve gzipped Content

Most modern browsers understand gzipped (compressed) content, so a well-performing page should aim to serve all of its content compressed. Since most images, swf files and other media files are already compressed, you don't need to worry about compressing them.

You do, however, need to take care of serving compressed HTML, CSS, client-side scripts, and any other type of text content. If you make XMLHttpRequests to services that return XML (or JSON, or plain text), make sure your server gzips this content as well.

If you open the Net panel in Firebug (or use LiveHTTPHeaders or some other packet sniffer), you can verify that the content is compressed by looking for a Content-Encoding header in the response, as shown in the following example:

Example request:

GET /2.2.2/build/utilities/utilities.js HTTP/1.1

Host: yui.yahooapis.com

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.5) Gecko/20070713 Firefox/2.0.0.5

Accept-Encoding: gzip,deflate

Example response:

HTTP/1.x 200 OK

Last-Modified: Wed, 18 Apr 2007 17:36:33 GMT

Vary: Accept-Encoding

Content-Type: application/x-javascript

Content-Encoding: gzip

Cache-Control: max-age=306470616

Expires: Sun, 16 Apr 2017 00:01:52 GMT

Date: Mon, 30 Jul 2007 21:18:16 GMT

Content-Length: 22657

Connection: keep-alive

In this request, the browser informed the server that it understands gzip and deflate encodings (Accept-Encoding: gzip,deflate) and the server responded with gzip-encoded content (Content-Encoding: gzip).

There's one gotcha when it comes to serving gzipped content: you must make sure that proxies do not get in your way. If an ISP's proxy caches your gzipped content and serves it to all of its customers, chances are that someone with a browser that doesn't support compression will receive your compressed content.

To avoid this you can use the Vary: Accept-Encoding response header to tell the proxy to cache this response only for clients that send the same Accept-Encoding request header. In the example above, the browser said it supports gzip and deflate, and the server responded with some extra information for any proxy between the server and client, saying that gzip-encoded content is okay for any client that sends the same Accept-Encoding content.

There is one additional problem here: some browsers (IE 5.5, IE 6 SP 1, for instance) claim they support gzip, but can actually experience problems reading it (as described on the Microsoft downloads site, and the support site). If you care about people using these browsers (they usually account for less than 1% of a site's visitors) you can use a different header -- Cache-Control: Private -- which eliminates proxy caching completely. Another way to prevent proxy caching is to use the header Vary: *.

To gzip or to Deflate?

If you're confused by the two Accept-Encoding values that browsers send, think of deflate as being just another method for encoding content that's less popular among browsers. It's also less efficient, so gzip is preferred.

Make Sure you Send gzipped Content

Okay, now let's see what you can do to start serving gzipped content in accordance with what your host allows.

Option 1: mod_gzip for Apache Versions Earlier than 2

If you're using Apache 1.2 and 1.3, the mod_gzip module is available. To verify the Apache version, you can check Firebug's Net panel and look for the Server response header of any request. If you can't see it, check you provider's documentation or create a simple PHP script to echo this information to the browser, like so:

<?php echo apache_get_version(); ?>

In the Server header signature, you might also be able to see the mod_gzip version, if it's installed. It might look like something like this:

Server: Apache/1.3.37 (Unix) mod_gzip/1.3.26.1a.....

Okay, so we've established that we want to compress all text content, PHP script output, static HTML pages, JavaScripts and style sheets before sending them to the browser. To implement this with mod_gzip, create in the root directory of your site an .htaccess file that includes the following:

mod_gzip_on Yes

mod_gzip_item_include mime ^application/x-javascript$

mod_gzip_item_include mime ^application/json$

mod_gzip_item_include mime ^text/.*$

mod_gzip_item_include file \.html$

mod_gzip_item_include file \.php$

mod_gzip_item_include file \.js$

mod_gzip_item_include file \.css$

mod_gzip_item_include file \.txt$

mod_gzip_item_include file \.xml$

mod_gzip_item_include file \.json$

Header append Vary Accept-Encoding

The first line enables mod_gzip. The next three lines set compression based on MIME-type. The next section does the same thing, but on the basis of file extension. The last line sets the Vary header to include the Accept-Encoding value.

If you want to send the Vary: * header, use:

Header set Vary *

Note that some hosting providers will not allow you to use the Header directive. If this is the case, hopefully you should be able to substitute the last line with this one:

mod_gzip_send_vary On

This will also set the Vary header to Accept-Encoding.

Be aware that there might be a minimum size condition on gzip, so if your files are too small (less than 1kb, for example), they might not be gzipped even though you've configured everything correctly. If this problem occurs, your host has decided that the gzipping process overhead is unnecessary for very small files.

Option 2: mod_deflate for Apache 2.0

If your host runs Apache 2 you can use mod_deflate. Despite its name, mod_deflate also uses gzip compression. To configure mod_deflate, add the following directives to your .htaccess file:

AddOutputFilterByType DEFLATE text/html text/css text/plain text/xml application/x-javascript application/json

Header append Vary Accept-Encoding

Option 3: php.ini

Ideally we'd like Apache to handle the gzipping of content, but unfortunately some hosting providers might not allow it. If your hosting provider is one of these, it might allow you to use custom php.ini files. If you place a php.ini file in a directory, it overwrites PHP configuration settings for this directory and its subdirectories.

If you can't use Apache's mod_gzip or mod_deflate modules, you might still be able to compress your content using PHP. In order for this solution to work, you'll have to configure your web server so that all static HTML, JavaScript and CSS files are processed by PHP. This means more overhead for the server, but depending on your host, it might be your only option.

Add the following directives in your .htaccess file:

AddHandler application/x-httpd-php .css

AddHandler application/x-httpd-php .html

AddHandler application/x-httpd-php .js

This will ensure that PHP will process these (otherwise static) files. If it doesn't work, you can try renaming the files to have a .php extension (like example.js.php, and so on) to achieve the same result.

Now create a php.ini file in the same directory with the following content:

[PHP]

zlib.output_compression = On

zlib.output_compression_level = 6

auto_prepend_file = "pre.php"

short_open_tag = 0

This enables compression and sets the compression level to 6. Values for the compression level range from 0 to 9, where 9 is the best (and slowest) compression. The last line sets up a file called pre.php to be executed at the beginning of every script, as if you had typed <?php include "pre.php"; ?> at the top of every script. You'll need this file in order to set Content-Type headers, because some browsers might not like it when you send a CSS file that has, for example, a text/html content type header.

The short_open_tag setting is there to disable PHP short tags (<? ... ?>, as compared to <?php ... ?>). This is important because PHP will attempt to treat the <?xml tag in your HTML as PHP code.

Finally, create the file pre.php with the following content:

<?php

$path = pathinfo($_SERVER['SCRIPT_NAME']);

if ($path['extension'] == 'css') {

header('Content-type: text/css');

}

if ($path['extension'] == 'js') {

header('Content-type: application/x-javascript');

}

?>

This script will be executed before every file that has a .php, .html, .js or .css file extension. For HTML and PHP files, the default Content-Type text/html is okay, but for JavaScript and CSS files, we change it using PHP's header function.

Option 3 (Variant 2): PHP Settings in .htaccess

If your host allows you to set PHP settings in your .htaccess file, then you no longer need to use php.ini file to configure your compression settings. Instead, set the PHP setting in .htaccess using php_value (and php_flag).

Looking at the modified example from above, we would have the same pre.php file, no php.ini file, and a modified .htaccess that contained the following directives:

AddHandler application/x-httpd-php .css

AddHandler application/x-httpd-php .html

AddHandler application/x-httpd-php .js

php_flag zlib.output_compression on

php_value zlib.output_compression_level 6

php_value auto_prepend_file "pre.php"

php_flag short_open_tag off

Option 4: In-script Compression

If your hosting provider doesn't allow you to use php_value in your .htaccess file, nor do they allow you to use a custom php.ini file, your last resort is to modify the scripts to manually include the common pre.php file that will take care of the compression. This is the least-preferred option, but sometimes you may have no other alternative.

If this is your only option, you'll either be using an .htaccess file that contains the directives outlined in Option 3 above, or you'll have had to rename every .js and .css file (and .xml, .html, etc.) to have a .php extension. At the top of every file, add <?php include "pre.php"; ?> and create a file called pre.php that contains the following content:

<?php

ob_start("ob_gzhandler");

$path = pathinfo($_SERVER['SCRIPT_NAME']);

if ($path['extension'] == 'css') {

header('Content-type: text/css');

}

if ($path['extension'] == 'js') {

header('Content-type: application/x-javascript');

}

?>

As I indicated, this is the least favorable option of all -- you should try Option 1 or 2 first, and if they don't work, consider Option 3 or 4, or a combination of both, depending on what your host allows.

Once you've established the degree of freedom your host permits, you can use the technique that you've employed to compress your static files to implement all of your Apache-related settings. For example, earlier I showed you how to set the Expires header. Well, guess what? Some hosts won't allow it. If you find yourself in this situation, you can use PHP's header function to set the Expires header from your PHP script.

To do so, you might add to your pre.php file something like this:

<?php

header("Expires: Mon, 25 Dec 2017 05:00:00 GMT");

?>

Disable ETags

Compared to the potential hassles that can be encountered when implementing the rule above, the application of this rule is very easy. You just need to add the following to your .htaccess file:

FileETags none

Note that this rule applies to sites that are in a server farm. If you're using a shared host, you could skip this step, but I recommend that you do it regardless because:

Hosts change their machines for internal purposes.

You may change hosts.

It's so simple.

Use CSS Sprites

Using a technique known as CSS sprites, you can combine several images into a single image, then use the CSS background-position property to show only the image you need. The technique is not intended for use with content images (those that appear in the HTML in <img /> tags, such as photos in a photo gallery), but is intended for use with ornamental and decorative images. These images will not affect the fundamental usability of the page, and are usually referenced from a style sheet in order to keep the HTML lean (Rule #0).

Let's look at an example. We'll take two images. The first is help.png; the second is rss.png. From these, we'll create a third image, sprite.png, which contains both images.

Combining two image files into a single image (click to view image)

The resulting image is often smaller in size than the sum of the two files' sizes, because the overhead associated with an image file is included only once. To display the first image, we'd use the following CSS rule:

#help {

background-image: url(sprite.png);

background-position: -8px -8px;

width: 16px;

height: 16px;

}

To display the second image, we'd use the following rule:

#rss {

background-image: url(sprite.png);

background-position: -8px -40px;

width: 16px;

height: 16px;

}

At first glance, this technique might look a bit strange, but it's really useful for decreasing the number of HTTP requests. The more images you combine this way, the better, because you're cutting the request overhead dramatically. For an example of this technique in use "in the wild", check out this image, used on Yahoo!'s homepage, or this one from Google's.

In order to produce sprite images quickly, without having to calculate pixel coordinates, feel free to use the CSS Sprites Generator tool that I've developed. And for more information about CSS sprites, be sure to read Dave Shea's article, titled CSS Sprites: Image Slicing's Kiss of Death.

Use Post-load Pre-loading and Inline Assets

If you're a responsible web developer, you're probably already adhering to the separation of concerns and using HTML for your content, CSS for presentation and JavaScript for behavior. While these distinct parts of a page should be kept in separate files at all times, for performance reasons you might sometimes consider breaking the rule on your index (home) page. The homepage should always be the fastest page on your site -- many first-time visitors may leave your site, no matter what content it contains, if they find the homepage slow to load.

When a visitor arrives at your homepage with an empty cache, the fastest way to deliver the page is to have only one request and no separate components. This means having scripts and styles inline (gasp)! It's actually possible to have inline images as well (although it's not supported in IE) but that's probably taking things too far. Apart from being semantically incorrect, using inline scripts and styles prevents those components from being cached, so a good strategy will be to load components in the background after the home page has loaded -- a technique with the slightly confusing name of post-load preloading. Let's see an example.

Let's suppose that the file containing your homepage is named home.html, that numerous other HTML files containing content are scattered throughout your site, and that all of these content pages use a JavaScript file, mystuff.js, of which only a small part is needed by the homepage.

Your strategy might be to take the part of the JavaScript that's used by the homepage out of mystuff.js and place it inline in home.html. Then, once home.html has completed loading, make a behind-the-scenes request to pre-load mystuff.js. This way, when the user hits one of your content pages, the JavaScript has already been delivered to the browser and cached.

Once again, this technique is used by some of the big boys: both Google and Yahoo! have inline scripts and styles on their homepages, and they also make use of post-load preloading. If you visit Google's homepage, it loads some HTML and one single image -- the logo. Then, once the home page has finished loading, there is a request to get the sprite image, which is not actually needed until the second page loads -- the one displaying the search results.

The Yahoo search page performs conditional pre-loading -- this page doesn't automatically load additional assets, but waits for the user to start typing in the search box. Once you've begun typing, it's almost guaranteed that you'll submit a search query. And when you do, you'll land on a search results page that contains some components that have already been cached for you.

Preloading an image can be done with a simple line of JavaScript:

new Image().src='image.png';

For preloading JavaScript files, use the JavaScript include_DOM technique and create a new <script> tag, like so:

var js = document.createElement('script');

js.src = 'mysftuff.js';

document.getElementsByTagName('head')[0].appendChild(js);

Here's the CSS version:

var css  = document.createElement('link');

css.href = 'mystyle.css';

css.rel  = 'stylesheet';

document.getElementsByTagName('head')[0].appendChild(css);

In the first example, the image is requested but never used, so it doesn't affect the current page. In the second example, the script is added to the page, so as well as being downloaded, it will be parsed and executed. The same goes for the CSS -- it, too, will be applied to the page. If this is undesirable, you can still pre-load the assets using XMLHttpRequest.

JavaScript Optimizations

Before diving into the JavaScript code and micro-optimizing every function and every loop, let's first look at what big-picture items we can tackle easily that might have a significant impact on a site's performance. Here are some guidelines for improving the impact that JavaScript files have on your site's performance:

Merge .js files.

Minify or obfuscate scripts.

Place scripts at the bottom of the page.

Remove duplicates.

Merge .js Files

As per the basic rules, you should aim for your JavaScripts to make as few requests as possible; ideally, this also means that you should have only one .js file. This task is as simple as taking all .js script files and placing them into a single file.

While a single-file approach is recommended in most cases, sometimes you may derive some benefit from having two scripts -- one for the functionality that's needed as soon as the page loads, and another for the functionality that can wait for the page to load first. Another situation in which two files might be desirable is when your site makes use of a piece of functionality across multiple pages -- the shared scripts could be stored in one file (and thus cached from page to page), and the scripts specific to that one page could be stored in the second file.

Minify or Obfuscate Scripts

Now that you've merged your scripts, you can go ahead and minify or obfuscate them. Minifying means removing everything that's not necessary -- such as comments and whitespace. Obfuscating goes one step further and involves renaming and rearranging functions and variables so that their names are shorter, making the script very difficult to read. Obfuscation is often used as a way of keeping JavaScript source a secret, although if your script is available on the Web, it can never be 100% secret. Read more about minification and obfuscation in Douglas Crockford's helpful article on the topic.

In general, if you gzip the JavaScript, you'll already have made a huge gain in file size, and you'll only obtain a small additional benefit by minifying and/or obfuscating the script. On average, gzipping alone can result in savings of 75-80%, while gzipping and minifying can give you savings of 80-90%. Also, when you're changing your code to minify or obfuscate, there's a risk that you may introduce bugs. If you're not overly worried about someone stealing your code, you can probably forget obfuscation and just merge and minify, or even just merge your scripts only (but always gzip them!).

An excellent tool for JavaScript minification is JSMin and it also has a PHP port, among others. One obfuscation tool is Packer -- a free online tool that, incidentally, is used by jQuery.

Changing your code in order to merge and minify should become an extra, separate step in the process of developing your site. During development, you should use as many .js files as you see fit, and then when the site is ready to go live, substitute your "normal" scripts with the merged and minified version. You could even develop a tool to do this for you. Below, I've included an example of a small utility that does just this. It's a command-line script that uses the PHP port of JSMin:

<?php

include 'jsmin.php';

array_shift($argv);

foreach ($argv AS $file) {

echo '/* ', $file, ' */';

echo JSMin::minify(file_get_contents($file)), "\n";

}

?>

Really simple, isn't it? You can save it as compress.php and run it as follows:

$ php compress.php source1.js source2.js source3.js > result.js

This will combine and minify the files source1.js, source2.js, and source3.js into one file, called result.js.

The script above is useful when you merge and minify as a step in the site deployment process. Another, lazier option is to do the same on the fly -- check out Ed Eliot's blog post, and this blog post by SitePoint's Paul Annesley for some ideas.

Many third-party JavaScript libraries are provided in their uncompressed form as well as in a minified version. You can therefore download and use the minified versions provided by the library's creator, and then only worry about your own scripts. Something to keep in mind is the licensing of any third-party library that you use. Even though you might have combined and minified all of your scripts, you should still retain the copyright notices of each library alongside the code.

Place Scripts at the Bottom of the Page

The third rule of thumb to follow regarding JavaScript optimization is that the script should be placed at the bottom of the page, as close to the ending </body> tag as possible. The reason? Well, due to the nature of the scripts (they could potentially change anything on a page), browsers block all downloads when they encounters a <script> tag. So until a script is downloaded and parsed, no other downloads will be initiated.

Placing the script at the bottom is a way to avoid this negative blocking effect. Another reason to have as few <script> tags as possible is that the browser initiates its JavaScript parsing engine for every script it encounters. This can be expensive, and therefore parsing should ideally only occur once per page.

Remove Duplicates

Another guideline regarding JavaScript is to avoid including the same script twice. It may sound like strange advice (why would you ever do this?) but it happens: if, for example, a large site used multiple server-side includes that included JavaScript files, it's conceivable that two of these might double up. The duplicate script would cause the browser's parsing engine to be started twice and possibly (in some IE versions) even request the file for the second time. Duplicate scripts might also be an issue when you're using third party libraries. Let's suppose you had a carousel widget and a photo gallery widget that you downloaded from different sites, and they both used jQuery. In this case you'd want to make sure that you didn't include jQuery twice by mistake. Also, if you use YUI, make sure you don't include a library twice by including, for example, the DOM utility (dom-min.js), the Event utility (event-min.js) and the utilities.js library, which contains both DOM and Event.

CSS Optimizations

Merge and Minify

For your CSS files you can follow the guidelines we discussed for JavaScripts: minify and merge all style sheets into a single file to minimize download size and the number of HTTP requests taking place. Merging all files into one is a trivial task, but the job of minification may be a bit harder, especially if you're using CSS hacks to target specific browsers -- since some hacks exploit parsing bugs in the browsers, they might also trick your minifier utility.

You may decide not to go through the hassle of minifying style sheets (and the associated re-testing after minification). After all, if you decide to serve the merged and gzipped style sheet, that's already a pretty good optimization.

If you do decide to minify CSS, apart from the option of minifying manually (simply removing comments and whitespace), you can use some of the available tools, such as CSSTidy, PEAR's HTML_CSS library (http://pear.php.net/package/HTML_CSS/), or SitePoint's own Dust-me Selectors Firefox plugin.

Place Styles at the Top of the Page

Your single, gzipped (and optionally minified) style sheet is best placed at the beginning of the HTML file, in the <head> section -- which is where you'd usually put it anyway. The reason is that most browsers (Opera is an exception) won't render anything on the page until the all the style sheets are duly downloaded and parsed. Additionally, none of the images referenced from the CSS will be downloaded unless the CSS parsing is complete. So it's better to include the CSS as early on the page as possible.

You might think about distributing images across different domains, though. Images linked from the CSS won't be downloaded until later, so in the meantime, your page can use the available download window to request content images from the domain that hosts the CSS images and is temporarily "idle".

Ban Expressions

IE allows JavaScript expressions in CSS, like this one:

#content {

left: expression(document.body.offsetWidth)

}

You should avoid JavaScript expressions for a number of reasons. First of all, they're not supported by all browsers. They also harm the "separation of concerns". And, when it comes to performance, expressions are bad because they're recalculated every time the page is rendered or resized, or simply when you roll your mouse over the page. There are ways to make expressions less expensive -- you can cache values after they're initially calculated, but you're probably better off simply to avoid them.

Tools for Performance Optimization

A number of tools can help you in your performance optimization quest. Most importantly, you'd want to monitor what's happening when the page is loaded, so that you can make informed decisions. Try these utilities:

Firebug's Net panel for Firefox, at http://www.getfirebug.com

YSlow, Yahoo!'s performance extension to Firebug, at http://developer.yahoo.com/yslow/

LiveHTTP Headers for Firefox, at http://livehttpheaders.mozdev.org/

Fiddler -- for IE, but also a general-purpose packet sniffer, at http://www.fiddlertool.com/fiddler/

HTTPWatch for IE (commercial, free version), at http://www.httpwatch.com/

Web Inspector for Safari, at http://webkit.org/blog/?p=41

Summary

Whew! If you've made it this far, you now know quite a lot about how to approach a site optimization project (and more importantly, how to build your next web site with performance in mind). Remember the general rule of thumb that, when it comes to optimization, you should concentrate on the items with the biggest impact, as opposed to "micro-optimizing".

You may choose not to implement all the recommendations discussed above, but you can still make quite a difference by focusing on the really low-hanging fruit, such as:

making fewer HTTP requests by combining components -- JavaScript files, style sheets and images (by using CSS Sprites)

serving all textual content, including HTML, scripts, styles, XML, JSON, and plain text, in a gzipped format

minifying and placing scripts at the bottom, and style sheets at the top of your files

using separate cookie-free domains for your components

Good luck with your optimization efforts -- it's very rewarding when you see the results!

Configure Web Logs in Apache

One of the many pieces of the Website puzzle is Web logs. Traffic analysis is central to most Websites, and the key to getting the most out of your traffic analysis revolves around how you configure your Web logs.

Apache is one of the most -- if not the most -- powerful open source solutions for Website operations. You will find that Apache's Web logging features are flexible for the single Website or for managing numerous domains requiring Web log analysis.

Author's Note: While most of this piece discusses configuration options for any operating system Apache supports, some of the content will be Unix/Linux (*nix) specific, which now includes Macintosh OS X and its underlying Unix kernel.

For the single site, Apache is pretty much configured for logging in the default install. The initial httpd.conf file (found in /etc/httpd/conf/httpd.conf in most cases) should have a section on logs that looks similar to this (Apache 2.0.x), with descriptive comments for each item. Your default logs folder will be found in /etc/httpd/logs. This location can be changed when dealing with multiple Websites, as we'll see later. For now, let's review this section of log configuration.

ErrorLog logs/error_log

LogLevel warn

LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined

LogFormat "%h %l %u %t \"%r\" %>s %b" common

LogFormat "%{Referer}i -> %U" referer

LogFormat "%{User-agent}i" agent

CustomLog logs/access_log combined

Error Logs

The error log contains messages sent from Apache for errors encountered during the course of operation. This log is very useful for troubleshooting Apache issues on the server side.

Apache Log Tip: If you are monitoring errors or testing your server, you can use the command line to interactively watch log entries. Open a shell session and type "tail –f /path/to/error_log". This will show you the last few entries in the file and also continue to show new entries as they occur.

There are no real customization options available, other than telling Apache where to establish the file, and what level of error logging you seek to capture. First, let's look at the error log configuration code from httpd.conf.

ErrorLog logs/error_log

You may wish to store all error-related information in one error log. If so, the above is fine, even for multiple domains. However, you can specify an error log file for each individual domain you have. This is done in the <VirtualHost> container with an entry like this:

<VirtualHost 10.0.0.2>

DocumentRoot "/home/sites/domain1/html/"

ServerName domain1.com

ErrorLog /home/sites/domain1/logs/error.log

</VirtualHost>

If you are responsible for reviewing error log files as a server administrator, it is recommended that you maintain a single error log. If you're hosting for clients, and they are responsible for monitoring the error logs, it's more convenient to specify individual error logs they can access at their own convenience.

The setting that controls the level of error logging to capture follows below.

LogLevel warn

Apache's definitions for their error log levels are as follows:

1299_apachelogstable1 (click to view image)

Tracking Website Activity

Often by default, Apache will generate three activity logs: access, agent and referrer. These track the accesses to your Website, the browsers being used to access the site and referring urls that your site visitors have arrived from.

It is commonplace now to utilize Apache's "combined" log format, which compiles all three of these logs into one logfile. This is very convenient when using traffic analysis software as a majority of these third-party programs are easiest to configure and schedule when only dealing with one log file per domain.

Let's break down the code in the combined log format and see what it all means.

LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined

LogFormat starts the line and simply tells Apache you are defining a log file type (or nickname), in this case, combined. Now let's look at the cryptic symbols that make up this log file definition.

1299_apachelogstable2 (click to view image)

To review all of the available configuration codes for generating a custom log, see Apache's docs on the module_log_config, which powers log files in Apache.

Apache Log Tip: You could capture more from the HTTP header if you so desired. A full listing and definition of data in the header is found at the World Wide Web Consortium.

For a single Website, the default entry would suffice:

CustomLog logs/access_log combined

However, for logging multiple sites, you have a few options. The most common is to identify individual log files for each domain. This is seen in the example below, again using the log directive within the <VirtualHost> container for each domain.

<VirtualHost 10.0.0.2>

DocumentRoot "/home/sites/domain1/html/"

ServerName domain1.com

ErrorLog /home/sites/domain1/logs/error.log

CustomLog /home/sites/domain1/logs/web.log

</VirtualHost>

<VirtualHost 10.0.0.3>

DocumentRoot "/home/sites/domain2/html/"

ServerName domain2.com

ErrorLog /home/sites/domain2/logs/error.log

CustomLog /home/sites/domain2/logs/web.log

</VirtualHost>

<VirtualHost 10.0.0.4>

DocumentRoot "/home/sites/domain3/html/"

ServerName domain3.com

ErrorLog /home/sites/domain3/logs/error.log

CustomLog /home/sites/domain3/logs/web.log

</VirtualHost>

In the above example, we have three domains with three unique Web logs (using the combined format we defined earlier). A traffic analysis package could then be scheduled to process these logs and generate reports for each domain independently.

This method works well for most hosts. However, there may be situations where this could become unmanageable. Apache recommends a special single log file for large virtual host environments and provides a tool for generating individual logs per individual domain.

We will call this log type the cvh format, standing for "common virtual host." Simply by adding a %v (which stands for virtual host) to the beginning of the combined log format defined earlier and giving it a new nickname of cvh, we can compile all domains into one log file, then automatically split them into individual log files for processing by a traffic analysis package.

LogFormat "%v %h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" cvh

In this case, we do not make any CustomLog entries in the <VirtualHost> containers and simply have one log file generated by Apache. A program created by Apache called split_logfile is included in the src/support directory of your Apache sources. If you did not compile from source or do not have the sources, you can get the Perl script.

The individual log files created from your master log file will be named for each domain (virtual host) and look like: virtualhost.log.

Log Rotation

Finally, we want to address log rotation. High traffic sites will generate very large log files, which will quickly swallow up valuable disk space on your server. You can use log rotation to manage this process.

There are many ways to handle log rotation, and various third-party tools are available as well. However, we're focusing on configurations native to Apache, so we will look at a simple log rotation scheme here. I'll include links to more flexible and sophisticated log rotation options in a moment.

This example uses a rudimentary shell script to move the current Web log to an archive log, compresses the old file and keeps an archive for as long as 12 months, then restarts Apache with a pause to allow the log files to be switched out.

mv web11.tgz web12.tgz

mv web10.tgz web11.tgz

mv web9.tgz  web10.tgz

mv web8.tgz  web9.tgz

mv web7.tgz  web8.tgz

mv web6.tgz  web7.tgz

mv web5.tgz  web6.tgz

mv web4.tgz  web5.tgz

mv web3.tgz  web4.tgz

mv web2.tgz  web3.tgz

mv web1.tgz  web2.tgz

mv web.tgz   web1.tgz

mv web.log   web.old

/usr/sbin/apachectl graceful

sleep 300

tar cvfz web.tgz web.old

This code can be copied into a file called logrotate.sh, and placed inside the folder where your web.log file is stored (or whatever you name your log file, e.g. access_log, etc.). Just be sure to modify for your log file names and also chmod (change permissions on the file) to 755 so it becomes an executable.

This works fine for a single busy site. If you have more complex requirements for log rotation, be sure to see some of the following sites. In addition, many Linux distributions now come with a log rotation included. For example, Red Hat 9 comes with logrotate.d, a log rotation daemon which is highly configurable. To find out more, on your Linux system with logrotate.d installed, type man logrotate.

Apache Module mod_log_config

Summary

This module provides for flexible logging of client requests. Logs are written in a customizable format, and may be written directly to a file, or to an external program. Conditional logging is provided so that individual requests may be included or excluded from the logs based on characteristics of the request.

Three directives are provided by this module: TransferLog to create a log file, LogFormat to set a custom format, and CustomLog to define a log file and format in one step. The TransferLog and CustomLog directives can be used multiple times in each server to cause each request to be logged to multiple files.

Directives

BufferedLogs

CookieLog

CustomLog

LogFormat

TransferLog

Topics

Custom Log Formats

Security Considerations

Custom Log Formats

The format argument to the LogFormat and CustomLog directives is a string. This string is used to log each request to the log file. It can contain literal characters copied into the log files and the C-style control characters "\n" and "\t" to represent new-lines and tabs. Literal quotes and back-slashes should be escaped with back-slashes.

The characteristics of the request itself are logged by placing "%" directives in the format string, which are replaced in the log file by the values as follows:

Format String Description

%% The percent sign (Apache 2.0.44 and later)

%...a Remote IP-address

%...A Local IP-address

%...B Size of response in bytes, excluding HTTP headers.

%...b Size of response in bytes, excluding HTTP headers. In CLF format, i.e. a '-' rather than a 0 when no bytes are sent.

%...{Foobar}C The contents of cookie Foobar in the request sent to the server.

%...D The time taken to serve the request, in microseconds.

%...{FOOBAR}e The contents of the environment variable FOOBAR

%...f Filename

%...h Remote host

%...H The request protocol

%...{Foobar}i The contents of Foobar: header line(s) in the request sent to the server.

%...l Remote logname (from identd, if supplied). This will return a dash unless IdentityCheck is set On.

%...m The request method

%...{Foobar}n The contents of note Foobar from another module.

%...{Foobar}o The contents of Foobar: header line(s) in the reply.

%...p The canonical port of the server serving the request

%...P The process ID of the child that serviced the request.

%...{format}P The process ID or thread id of the child that serviced the request. Valid formats are pid and tid. (Apache 2.0.46 and later)

%...q The query string (prepended with a ? if a query string exists, otherwise an empty string)

%...r First line of request

%...s Status. For requests that got internally redirected, this is the status of the *original* request --- %...>s for the last.

%...t Time the request was received (standard english format)

%...{format}t The time, in the form given by format, which should be in strftime(3) format. (potentially localized)

%...T The time taken to serve the request, in seconds.

%...u Remote user (from auth; may be bogus if return status (%s) is 401)

%...U The URL path requested, not including any query string.

%...v The canonical ServerName of the server serving the request.

%...V The server name according to the UseCanonicalName setting.

%...X

Connection status when response is completed:

`X` =	connection aborted before the response completed.
`+` =	connection may be kept alive after the response is sent.
`-` =	connection will be closed after the response is sent.

(This directive was %...c in late versions of Apache 1.3, but this conflicted with the historical ssl %...{var}c syntax.)

%...I Bytes received, including request and headers, cannot be zero. You need to enable mod_logio to use this.

%...O Bytes sent, including headers, cannot be zero. You need to enable mod_logio to use this.

The "..." can be nothing at all (e.g., "%h %u %r %s %b"), or it can indicate conditions for inclusion of the item (which will cause it to be replaced with "-" if the condition is not met). The forms of condition are a list of HTTP status codes, which may or may not be preceded by "!". Thus, "%400,501{User-agent}i" logs User-agent: on 400 errors and 501 errors (Bad Request, Not Implemented) only; "%!200,304,302{Referer}i" logs Referer: on all requests which did not return some sort of normal status.

The modifiers "<" and ">" can be used for requests that have been internally redirected to choose whether the original or final (respectively) request should be consulted. By default, the % directives %s, %U, %T, %D, and %r look at the original request while all others look at the final request. So for example, %>s can be used to record the final status of the request and %<u can be used to record the original authenticated user on a request that is internally redirected to an unauthenticated resource.

Note that in httpd 2.0 versions prior to 2.0.46, no escaping was performed on the strings from %...r, %...i and %...o. This was mainly to comply with the requirements of the Common Log Format. This implied that clients could insert control characters into the log, so you had to be quite careful when dealing with raw log files.

For security reasons, starting with 2.0.46, non-printable and other special characters are escaped mostly by using \xhh sequences, where hh stands for the hexadecimal representation of the raw byte. Exceptions from this rule are " and \ which are escaped by prepending a backslash, and all whitespace characters which are written in their C-style notation (\n, \t etc).

Note that in httpd 2.0, unlike 1.3, the %b and %B format strings do not represent the number of bytes sent to the client, but simply the size in bytes of the HTTP response (which will differ, for instance, if the connection is aborted, or if SSL is used). The %O format provided by mod_logio will log the actual number of bytes sent over the network.

Some commonly used log format strings are:

Common Log Format (CLF): "%h %l %u %t \"%r\" %>s %b"
Common Log Format with Virtual Host: "%v %h %l %u %t \"%r\" %>s %b"
NCSA extended/combined log format: "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\""
Referer log format: "%{Referer}i -> %U"
Agent (Browser) log format: "%{User-agent}i"

Note that the canonical ServerName and Listen of the server serving the request are used for %v and %p respectively. This happens regardless of the UseCanonicalName setting because otherwise log analysis programs would have to duplicate the entire vhost matching algorithm in order to decide what host really served the request.

Security Considerations

See the security tips document for details on why your security could be compromised if the directory where logfiles are stored is writable by anyone other than the user that starts the server.

BufferedLogs Directive

Description:	Buffer log entries in memory before writing to disk
Syntax:	`BufferedLogs On\|Off`
Default:	`BufferedLogs Off`
Context:	server config
Status:	Base
Module:	mod_log_config
Compatibility:	Available in versions 2.0.41 and later.

The BufferedLogs directive causes mod_log_config to store several log entries in memory and write them together to disk, rather than writing them after each request. On some systems, this may result in more efficient disk access and hence higher performance. It may be set only once for the entire server; it cannot be configured per virtual-host.

This directive is experimental and should be used with caution.

CookieLog Directive

Description:	Sets filename for the logging of cookies
Syntax:	`CookieLog filename`
Context:	server config, virtual host
Status:	Base
Module:	mod_log_config
Compatibility:	This directive is deprecated.

The CookieLog directive sets the filename for logging of cookies. The filename is relative to the ServerRoot. This directive is included only for compatibility with mod_cookies, and is deprecated.

CustomLog Directive

Description:	Sets filename and format of log file
Syntax:	`CustomLog file\|pipe format\|nickname [env=[!]environment-variable]`
Context:	server config, virtual host
Status:	Base
Module:	mod_log_config

The CustomLog directive is used to log requests to the server. A log format is specified, and the logging can optionally be made conditional on request characteristics using environment variables.

The first argument, which specifies the location to which the logs will be written, can take one of the following two types of values:

file: A filename, relative to the ServerRoot.
pipe: The pipe character "|", followed by the path to a program to receive the log information on its standard input.
Security:

If a program is used, then it will be run as the user who started httpd. This will be root if the server was started by root; be sure that the program is secure.

Note

When entering a file path on non-Unix platforms, care should be taken to make sure that only forward slashed are used even though the platform may allow the use of back slashes. In general it is a good idea to always use forward slashes throughout the configuration files.

The second argument specifies what will be written to the log file. It can specify either a nickname defined by a previous LogFormat directive, or it can be an explicit format string as described in the log formats section.

For example, the following two sets of directives have exactly the same effect:

 # CustomLog with format nickname

LogFormat "%h %l %u %t \"%r\" %>s %b" common

CustomLog logs/access_log common

# CustomLog with explicit format string

CustomLog logs/access_log "%h %l %u %t \"%r\" %>s %b"

The third argument is optional and controls whether or not to log a particular request based on the presence or absence of a particular variable in the server environment. If the specified environment variable is set for the request (or is not set, in the case of a 'env=!name' clause), then the request will be logged.

Environment variables can be set on a per-request basis using the mod_setenvif and/or mod_rewrite modules. For example, if you want to record requests for all GIF images on your server in a separate logfile but not in your main log, you can use:

 SetEnvIf Request_URI \.gif$ gif-image

CustomLog gif-requests.log common env=gif-image

CustomLog nongif-requests.log common env=!gif-image

Or, to reproduce the behavior of the old RefererIgnore directive, you might use the following:

 SetEnvIf Referer example\.com localreferer

CustomLog referer.log referer env=!localreferer

LogFormat Directive

Description:	Describes a format for use in a log file
Syntax:	`LogFormat format\|nickname [nickname]`
Default:	`LogFormat "%h %l %u %t \"%r\" %>s %b"`
Context:	server config, virtual host
Status:	Base
Module:	mod_log_config

This directive specifies the format of the access log file.

The LogFormat directive can take one of two forms. In the first form, where only one argument is specified, this directive sets the log format which will be used by logs specified in subsequent TransferLog directives. The single argument can specify an explicit format as discussed in the custom log formats section above. Alternatively, it can use a nickname to refer to a log format defined in a previous LogFormat directive as described below.

The second form of the LogFormat directive associates an explicit format with a nickname. This nickname can then be used in subsequent LogFormat or CustomLog directives rather than repeating the entire format string. A LogFormat directive that defines a nickname does nothing else -- that is, it only defines the nickname, it doesn't actually apply the format and make it the default. Therefore, it will not affect subsequent TransferLog directives. In addition, LogFormat cannot use one nickname to define another nickname. Note that the nickname should not contain percent signs (%).

Example

LogFormat "%v %h %l %u %t \"%r\" %>s %b" vhost_common

TransferLog Directive

Description:	Specify location of a log file
Syntax:	`TransferLog file\|pipe`
Context:	server config, virtual host
Status:	Base
Module:	mod_log_config

This directive has exactly the same arguments and effect as the CustomLog directive, with the exception that it does not allow the log format to be specified explicitly or for conditional logging of requests. Instead, the log format is determined by the most recently specified LogFormat directive which does not define a nickname. Common Log Format is used if no other format has been specified.

Example

 LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\""

TransferLog logs/access_log

New Technologies in Web Development

New Technologies

Sunday, August 24, 2008

Web Site Optimization: 13 Simple Steps

Credits and Suggested Reading

Basic Optimization Rules

Optimizing Assets

JavaScript Optimizations

CSS Optimizations

Tools for Performance Optimization

Summary

Configure Web Logs in Apache

Error Logs

Tracking Website Activity

Log Rotation

Apache Module mod_log_config

Directives

Topics

See also

Custom Log Formats

Security Considerations

BufferedLogs Directive

CookieLog Directive

CustomLog Directive

Security:

Note

LogFormat Directive

TransferLog Directive

About Me

Book Mark

Live Cricket

Counter

Visitors From

My Sites

See Also

Live Traffic Feed

BookMark