Part 1 of 3: Reducing File Size

In This Tutorial

  1. Introduction
  2. What's YSlow Got to Do With It?
  3. Minimizing File Sizes
  4. Scaling Images
  5. Minify JS
  6. Minify CSS
  7. Favicons
  8. GZip
  9. Entity Tags (ETags)
  10. Conclusion


Anyone who has been in the web design business knows that it is a constant uphill battle. Just learning the markup, stylesheets, and backend isn't enough to make your website compete in today's market of services and applications. Years of experience can provide you with a beautiful, modern website; it can function in all browsers and even pass validation. Satisfied? Unfortunately, you're not done yet. All intended web development professionals need to be well-versed to optimizing speed.

In a nutshell, the goal of optimization is to decrease page load times. Some steps may decrease loading times by noticeable seconds, providing for a more comfortable user experience; some steps may only decrease it by milliseconds. Nonetheless, every step is important.

"Why should I care about milliseconds," you ask? It's about more than just the time. The importance of each step will be described within its own section, but to peak your interest, I'll provide you with a prime example. One goal of webpage optimization is to decrease the number of HTTP requests that your webpage makes. Each request takes approximately 50 milliseconds to make the connection to the server, excluding the time it takes to download the requested file. Fifty milliseconds isn't really important. The goal of decreased requests is to decrease server stress and improve server performance. More on that in part two.

What's YSlow Got to Do With It?

YSlow is a powerful webpage analyzer made by Yahoo! Inc. Extensions exist for Chrome, Firefox, and many other browsers as well as some servers. The tool will grade you on various optimization checkpoints. This tutorial will help you achieve perfect scores in each.

Reducing File Sizes

This tutorial is split into these three sections: Reducing File Size, Reducing Server Calls, and Reducing Parse Time. This article in particular covers something of which we can all easily understand the advantages: reduced file sizes. Not only does it decrease download time - and thus loading speeds - for the client, it also saves bandwidth - and thus money - for the provider. This can be accomplished in five steps, in order of ease: HTML image scaling (the do's and don't's), minifying JavaScript and CSS, making favicons small and cacheable, compressing components with gzip, and configuring entity tags (ETags).

Do Not Scale Images in HTML

Are you resizing images in-browser? Don't do it. It's that simple. Some web programmers, when desiring a thumbnail image, will include the full-sized image and just use the image's height and width attributes (via HTML or CSS) to resize the image. This saves time and space, since you don't have to generate or store a thumbnail image. Great idea, right? Not at all. This practice is accompanied with three major negative factors that need to be considered.

  1. Bandwidth.

    Any server administrator would be appalled to learn that they're using the bandwidth cost of a full-sized image to display a thumbnail. Let's take an extreme example of scaled-image thumbnailing. A wallpaper website is displaying thumbnails of each wallpaper so that they can offer a selection of wallpapers in a small area of space. The full wallpaper (1024×768; 200KB) is used and simply scaled down to 133×100. A hypothetical 133×100 thumbnail dedicated to its own file would only take up approximately 5KB. That's a difference of 195KB per wallpaper per page load. By scaling the wallpaper in-browser, you save a whopping 5KB of server space, but you pay for it in bandwidth. You should be willing to trade 5KB of space for 195KB bandwidth - especially when multiplied per file - any day. This holds true for even less extreme examples, such as half-sizing an image. The bandwidth cost will add up on high-traffic servers.

  2. Download Time.

    The full image may not be displayed, but the user still has to wait for it to download. Even a user on a dial-up or mobile connection could download a 133×100 thumbnail in a second; but if you scale a full-size image, especially multiple images per page, there is an unbelievably noticeable difference in page loading times. A collection of 10 wallpapers on the page with legitimate thumbnails would take only 50KB - a near instantaneous download for a user with average Internet speeds. The same wallpapers scaled in HTML would require a download of 2MB! That's a noticeable page loading difference.

  3. Parsing Time.

    Though also the subject of the third part of this tutorial trilogy, parse time is also affected by inline scaling. The use of an image's width and height attributes will be covered again, except shown in a different light. By setting them, the browser must calculate for and resize the larger image in order to create the thumbnail. These are precious milliseconds wasted on every image load for every viewer!

But the hassle... you'd have to open each wallpaper in your favorite image editor, scale it down, upload it, link to it, etc. Isn't the time and effort saved worth the bandwidth cost? Well, the answer is no. Thanks to modern server-side programming languages, this process can be automated - no image editors, no per-image labor, no uploading, no manual link additions. However, that's another topic for another tutorial. In the mean time, keep in mind the cost of image-scaling and make a habit of not doing it.

Minify JavaScript

A quick and easy way to save bandwidth and decrease download time is to shorten JavaScript and CSS file lengths. JavaScript, especially, is filled with redundancy and unnecessarily long variable names. Why reference myFavoriteVariable 500 times, when you can just call it x and save 17 bytes with every reference? Why display 100 tab indents when it will run just the same without them? Don't get me wrong! I'm not advocating hideous code. For the love of all that is good, use yourFavoriteVariableName and tabular spacing. When programming, readability is important. When processing, readability is useless. But how can you program legibly and still send a compressed file to the client? The answer is to use a "minifier"!

Similar to thumbnails, you can automate minification. Also similar to thumbnails, the automation tool will not be included in this tutorial for the sake of brevity. However, you may use the forever-useful Closure Compiler Service by Google Inc. This service allows you to filter your legible, commented, indented code into the compiler, then copy its condensed, comment-free, small-bandwidth code into a minified file for you to use on your webpage. There are three options for this service: whitespace only, simple, and advanced.

Whitespace only is self-explanatory. The only compression it does is remove whitespace and comments. Simple is most likely what you will be wanting on your pages. It doesn't rename public variables that may be called outside of the script. For example, you may have a JavaScript file that contains the function expandCollapse. This file is intended to be referenced externally so that said function can be called within the HTML document itself (e.g. when a link is clicked). The Simple compiler is the one you would want to use, since it will not rename the function, thus not affecting references to it from outside the script itself. The Advanced compiler, on the other hand, will treat the script as if it were the entirety of the program. It will compress all variables, so outside references to the script will likely be broken when the variables are renamed.

In order for you to get a first-hand understanding, here are example compressions for a sample script (line breaks added by me post-compression for word-wrapping's sake):

Uncompressed Script: 301 bytes

var i_heart_you = function(my_custom_code) {
	for (var x = 0; x < 10; x++) {
		document.write("I am going to remind you that ");
		// why did I even use two document.writes?
		document.write("my custom code is " + my_custom_code);
		x += 1;
i_heart_you("copyright Charles Stover");

Whitespace only: 212 bytes

var i_heart_you=function(my_custom_code){for(var x=0;x<10;x++){
document.write("I am going to remind you that ");document.write("my custom code is "+my_custom_code);
x+=1}};i_heart_you("copyright Charles Stover");

Note: It removed whitespace and comments. That is all.

Simple: 186 bytes

var i_heart_you=function(b){for(var a=0;10>a;a++)document.write("I am going to remind you that "),
document.write("my custom code is "+b),a+=1};i_heart_you("copyright Charles Stover");

Note: It renamed all the private variables (the ones referenced within the function itself), but left the function name the same so that it can be referenced by other programs.

Advanced: 139 bytes

for(var a=0;10>a;a++)document.write("I am going to remind you that "),
document.write("my custom code is copyright Charles Stover"),a+=1;

Note: It removed the i_heart_you function and my_custom_code variable entirely, since they weren't necessary, having only been called once. I'm surprised it didn't combine both document.writes, but I guess no minifier is perfect.

The imperfection in Advanced brings me to an important point. After minifying any code, verify that it still works. That can't be stressed enough. While I've personally had a 100% success rate with small scripts such as this, compressing thousand-line projects has been a more complicated issue. Don't blindly trust that the compiler did a good job. If Simple and Advanced compression algorithms destroy functionality in your complex project, you should at least be able to rely on ol' Whitespace-only to save you some bandwidth with no errors attached.

Minify CSS

While JavaScript compression will save a ton of bandwidth, let's not forget about CSS! I've yet to find a CSS compression utility that absolutely trumps the rest, but the most reliable I've found is CSS Minifier. The problem with CSS compression is that there is no one method better than the rest. It's really a guess-and-check principle.

It may result in a smaller file size to group CSS declarations by element:

form {
	/* all attributes of the form element */

Or by individual attribute:

.that-are-red {
	color: #ff0000;

Or by multiple shared attributes:

.red-and-large {
	color: #ff0000;
	font-size: 2em;

.red-and-small {
	color: #ff0000;
	font-size: 0.5em;

There really is no way to know which minification algorithm is most efficient until you've tried it, as each CSS file will reduce differently from each method of compression.

Due to the countless combination of compression settings, you may have to try different options with your CSS compression utility until you find one that shows a noticeable difference. Unlike JavaScript compression, you aren't as likely to save as much bandwidth, and it will take longer to find the appropriate setting, but bandwidth saved is bandwidth earned to put a spin on an old phrase.

Make Favicons Small and Cacheable

Your favicon is a very important and highly-accessed part of your website. Some browsers render it as 32×32 and many software packages practice the use of a 32×32 favicon by default. Don't let them fool you. Well over 90% of your viewers, if not 99%, are going to be viewing your favicon as 16×16. It is pointless to have the vast majority of your viewers downloading (remember: download time and bandwidth costs) an image that's four times larger than they'll be able to see. Use a 16×16 favicon to decrease file size, and let the vast minority who view 32×32 favicons view a likely-unnoticeably stretched icon.

Most browsers are good at caching favicons without server recommendations, since it's a file that's requested on every page load. YSlow insists that you manually make the favicon cacheable anyway, and it wouldn't hurt to listen. On the off-chance there is some browser out there (probably a beta or mobile browser) that doesn't cache favicons by default, it will save you bandwidth and save your clients loading time. Due to the fact that the majority of browsers cache favicons by default and that caching is covered in part two of this tutorial, I'll leave it out of this section. Don't forget, when you learn to set permanent cache (or if you already know), to go back and set it for your favicon!

Compress Components With GZip

Finally, we get to server-side programming! Ah, the most complicated - but the most fun - area of web programming. gzip, if you aren't aware, is a method of file compression. Unlike the JavaScript and CSS compression tools, it doesn't just remove useless characters. It changes the file type altogether: think ZIP or RAR files. Modern web browsers support receiving gzip files and are capable of automatically extracting and displaying them. This is the case for any file type. There are two ways to gzip a file: one with static files and one with dynamic files.

To determine if your viewer can accept gzipped files, just check the $_SERVER['HTTP_ACCEPT_ENCODING'] variable. An example value for HTTP_ACCEPT_ENCODING is gzip,deflate,sdch.

Dynamic Files

For files that change often (such as a home page, topic list, counter, and pretty much any non-archival HTML page), it will be nearly impossible and pointless to create and save a gzipped copy every time the page changes, which is possibly every few seconds. If only there were a way to gzip a file in real-time. But wait! There is! PHP is capable of doing this using the lovely ob_start function. After the headers of a page have sent (assuming you're sending custom headers), just wrap the ob_start function with the ob_gzhandler parameter around the content of the page.

For those wondering, the ob in ob_start is an acronym for output buffer. It records all the output and manipulates it in whatever way you decide before sending it to the client. In this case, we're going to gzip the output before actually sending it.

// get the headers out of the way
header('Content-Language: en-us');
header('Content-Type: text/html; charset=utf-8');

// gzip the following content

// the content itself; everything in the source code goes here
include 'path/to/header.html';
echo '<p>This is all the content on my homepage!</p>';
include 'path/to/footer.html';

// close the buffer and send the data to the client

Simple, right? Just put a line of code above the content (to tell it to start recording what to compress) and a line of code after the content (to tell it to compress and send the data).

"But, Charles!" you so rudely interrupt, "What about the browsers that don't support gzip? Won't they get errors when receiving the gzipped content?"

No, no, rude reader. PHP's gzip-handling object buffer is kind enough to check the HTTP_ACCEPT_ENCODING for us! If the browser supports it, it will send the gzipped content. If the browser doesn't support it, it will send unaltered content. Easy, huh? Just use the object buffer for all your dynamic, PHP-generated files, and let PHP do the rest!

Bandwidth saved.

Static Files

To save server resources, if you know a file is not going to change very often, you may want to store a gzipped copy on the server instead of having PHP automatically generate one every time the file is accessed. You can either use the gzip program provided by the gzip home page to compress every file manually, or you can just use PHP's gzencode function to create any non-existent gzip files. To do this, have users access a PHP file which will determine whether or not their browser supports gzip. If their browser does, send them the gzipped file. If it does not, send them the uncompressed file. I've included an example of this, but you may feel free to create your own.

// e.g. download.php?file=path/to/file.jpg
// This would be a great time to use mod_rewrite. ;)

// Always sanitize your inputs!
if (!preg_match('/^[\d\w]+\.[\d\w]{2,4}$/', $_GET['file']))

// This is where we will store the gzip copy.
$gzip_file = $_GET['file'] . '.gz';

// Worry about the compressed file only if gzip is part of the browser's accepted encoding.
if (preg_match('/[^|,]gzip[,|$]/', $_SERVER['HTTP_ACCEPT_ENCODING'])) {

	// Check to see if a compressed copy exists.
	if (!file_exists($gzip_file))

		// Create the compressed file if it doesn't exist.

	// Tell the browser the content is encoded.
	header('Content-Encoding: gzip');

	// Send the encoded content!
	echo file_get_contents($gzip_file);

// gzip is not supported (not in the ACCEPTED_ENCODING header), so just return the uncompressed file.
	echo file_get_contents($_GET['file']);

You will want to cache the gzipped file instead of just gzencode'ing the contents every time it is downloaded. To gzip the contents, the server has to calculate the gzipped contents. This takes time, albeit less time than the user saves by not downloading as large of a file, so even gzipping it on every page load is a step up from not gzipping it at all. By caching the gzipped copy, the server doesn't have to recalculate it with every page load, saving precious seconds of time and CPU usage.

Configure Entity Tags (ETags)

Last but not least, you'll want to set up entity tag headers. There's a very simple analogy that can be used to explain how ETags work.

Two Days Ago
ClientI'd like to read reducing-file-size, please.
ServerSure thing. Here it is. This is version 12345678 .
ClientI was just in here a few days ago, and I picked up a copy of reducing-file-size. The one you gave me was version 12345678 . Is there a newer copy?
Scenario 1
ServerNope, that's the latest.
Scenario 2
ServerWhy, yes there is! Here it is. This is version 12345690 .

ETags are like a method of caching. Their only downfall compared to caching is that they do require the client connects to the server in the first place, however they will greatly reduce bandwidth. It's a great idea to combine ETags with cache, but you don't need to worry about cache until we come to that part of the tutorial. For now, you'll get a lot of mileage out of simply using ETags.

The easiest method to calculate an ETag is to just return the file's modification time for static content. When you update the file, the mod time automatically changes; thus the next time the client requests the file, the server will know that the client's copy is outdated. However, if you're using a dynamic PHP file, filemtime won't necessarily change just because the content changes. Take for example a simple file that contains <?php echo time(); ?>. Every time you view the page, the content will be different. However, the filemtime will always be the same, since the file itself hasn't been modified; only its output has changed. In these cases, there are different methods you can use to determine an ETag, and it's ultimately up to you to decide the best method. If you have content on the page and are capable of determining the last time the content was updated (e.g. the time of an article posting), you can use that to generate the ETag. Otherwise, you may have to resort to something such as generating an MD5 hash of the output and using it as the ETag. If the output changes, the MD5 hash changes, thus the ETag changes.

Once you have determined how to calculate your ETag - simply any string that will change whenever the page changes - and have generated the string, you can send it using the ETag header.

header('ETag: ' . filemtime($_GET['file']));

Simple enough, right? Unfortunately, there's more. Whenever the ETag has not been updated since the user last downloaded the file, the server should not send any data, because the user already has the file cached. This situation is not automated. It is up to you to prevent data from being sent. You can do this like so:

// e.g. download.php?file=path/to/file.jpg
// This would be a great time to use mod_rewrite. ;)

// Always sanitize your inputs!
if (!preg_match('/^[\d\w]+\.[\d\w]{2,4}$/', $_GET['file']))

// If the file doesn't exist, give it a "404" ETag.
// That way, if it exists in the future, the ETag will change.
// Until then, there's no need to keep resending the 404 error file.
$etag = file_exists($_GET['file']) ? filemtime($_GET['file']) : 404;

// If the client already has a copy and it is the same as the server copy...
if (
	array_key_exists('If-Modified-Since', $_SERVER) &&
	$_SERVER['If-Modified-Since'] == $etag
) {

	// Tell the client that there is no newer version, and don't send any data.
	header('HTTP/1.1 304 Not Modified');
	header('Connection: close');

// Either the client has no previous copy, or it is not the latest copy.

// Send the entity tag (unique version ID).
header('ETag: ' . $etag);

// Send the data. (Don't forget to compress it!)

// The file exists:
if (file_exists($_GET['file']))
	echo file_get_contents($_GET['file']);

// 404 error document
	include '404.php';


Viola! The client now has a copy of a file and a version number for future reference. Assuming the browser supports ETags (most, if not all, modern browsers do), you just saved yourself a ton of bandwidth in repeated file accesses.


You should now be well-versed in handling scaled images and minifying your JavaScript and CSS components. This introduction to caching and compressing data already makes you a more valuable and resource-savvy developer, but if you want to delve deeper down the rabbit hole that is micro-optimization, you can check out Part 2: Reducing Server Calls.