Interpreting Log Files
The most widely used metrics for measuring traffic to your site involve counting hits, page views, visits and unique visitors. They all measure different things and all have shortcomings that affect their reliability.
Typical Log File
Here's a typical overview from a Web traffic tool. You can see how hits, pages and visits are the numbers given the most prominence. However, none of these will measure exactly how many people came to your site.
Hits
The biggest number your traffic tool reports is always the number of hits your Web site has received. Unfortunately, it's also the least useful, and it’s misleading. That’s because hits measure every request a browser makes to download every element on a Web page — not just the HTML file itself, but every graphic, every ad banner and every other related file.
These elements can add up quickly. Most sites have a logo, a background image, an image used for spacing content, and sometimes a graphical navigation bar or advertising banner. The term "hits" also counts each Flash file, audio and video file.
So when someone views a single page on your Web site, their browser actually makes multiple requests, one for every element it must have in order to display the page: the HTML, external style sheets, external JavaScript files, and each unique image.
Even Google's simple, clean, crisp home page registers at least two hits when viewed.
Hits were the first unit of measure used for Web traffic because they are easy to count. They fell out of favor because they're also very easy to manipulate. For instance, if you want to increase the number of hits your site gets, simply add more images to your pages. Or split the images you serve into multiple files — the user won't notice the difference, but the computer counts each file separately as it's served.
Making your hits skyrocket is quite easy without actually getting more readers or even serving more pages. You also have higher bandwidth costs and a slower site, but saying you had a million hits last month is fun.
On the other hand, scouring your pages to remove extraneous graphics can save your bandwidth and please your customers with fast-loading pages, but it will lower your hit count.
Hits are sometimes misleadingly called "page hits," but they're not pages.
Pages
Pages, also called page views, are a little harder to quantify but are more meaningful indicators of your site’s value and success. Where hits measure every single file that loads with a Web page, page views only count the single HTML document itself. When you call up the home page of Google, Yahoo, or The Washington Post, what's displayed in the browser is the page or page view. However, page view counts exclude non-HTML file formats such as PDF or Microsoft Word documents.
Page views are useful information. You probably don't care how many times your logo was viewed, but you do care how many times the “About Us” page was called up. When the number of page views is high and growing, it signals a measure of popularity with readers.
However, page views can also be misleading. For example, use of HTML frames can inflate the page count number by several hundred percent, requiring visitors to load three Web pages at once in different sections of a single browser window. So, if you're using frames, your Web stats will need to be interpreted more carefully than normal.
Nevertheless, page views are one of the more popular metrics — your advertisers, competitors, readers and staff will want to know what the page views are.
Visits
Despite the preference for page views statistics, we recommend measuring visits and unique visitors as a more important measure of your site's reach and success. A visit is what it sounds like — one person coming to your site and looking at some pages. It might be a repeat visit or a first-time visit. A unique visitor means you can tell whether the one person looking at your site right now is different from the one who came by this morning.
A “visit” is usually defined as one or more page views from a unique user separated by an hour from any other page views. This isn't a perfect system. If your biggest fan checks your site every 30 minutes from work, that shows up as a single, 8-hour visit. If someone else starts reading your site, wanders off to have dinner and then comes back to finish, that's two visits from one unique user.
A visit is usually measured by the Web server setting a cookie when your site is first accessed. It then tracks that cookie as the reader browses to different pages on your site. Another reason it's not foolproof? Some people disable cookies. With cookies turned off, the same visitor looking at five different pages looks like five visits rather than one.
Visit and unique visitor counts can also be skewed by networks, such as those in schools and libraries that use the same identifying information for each computer. A classroom of 30 students all looking at your site at the same time, for instance, might all appear to come from the same Internet connection, and that's reported as a single visit instead of 30 visits.
So your unique visitor count can be both underreported and overreported. You can take steps to reduce this uncertainty by requesting that visitors register or you can require that visitors allow cookies, but such demands tend to drive away readers.
Reading Your Log Files
Most Web hosts will give you access to your site’s log files. A log file is simply a recording of every hit registered by the Web server. It’s the raw, unfiltered base that most site traffic tools use to serve up traffic reports.
Few people ever look through their raw log files. However, knowing what information they contain can help you know what you can get out of your traffic tools.
Just for now, take a look at this sample from a typical log file.
70.109.205.246 - - [17/Mar/2005:14:46:27 -0600] "GET /2004votetracker.jpg HTTP/1.1" 200 24134 "http://www.j-lab.org/coolstuff.html" "Mozilla/4.0 (compatible; MS IE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)"
Here are the elements of this file:
- IP address. The beginning set of four numbers separated by periods is the visitor's IP address — the unique number associated with a computer connecting to the Internet. Traffic tools can look up that number to find out if the computer using it is based in .com, .edu or an internationally based network.
- Date. Inside the [] brackets is the date. You can see what day of the week, or what hour of the day, your traffic happens. You can use this information to make strategic decisions about posting articles when your site is busiest and running site modifications when things are slowest.
- Page URL. Between the first set of quotation marks you find the URL of the page being requested contained between the terms GET and HTTP. The traffic tool determines if the request is for a page or just an image file and what content to serve back.
- Server response code. Next comes a number indicating the computer's response to the request. A 200 means, "I gave back the data properly." A 404 means, "I couldn't find the data to give back," which is a serious error. Any 404 responses mean someone tried to follow a link or request a page that wasn't there. You should try to minimize 404 responses by fixing broken links, getting other sites to update or fix incorrect links to you. If you see a page that's frequently requested but doesn't exist, consider making one with that name.
- File size. The next number is the size of the file that was returned. Smaller files download more quickly.
- Referer URL. In the second set of quotation marks you find what is the most interesting bit of data in your log files: the page from which the request originated, called the “referer.” And yes, it’s supposed to be misspelled. The referer might be another page on your site, or another site that linked to you, or a search engine that returned your page in its results. By analyzing what Web heads call your “refers,” you can tell who is sending you traffic and what search terms are resulting in the most clicks to your site. You can't tell, though, what sites simply have links to you or what search terms make your site appear at the top. You only find that out when someone clicks a link.
- User Agent: After the referer comes the user agent, which is the name of the program requesting the page. This information is handy because it tells you which browsers people are using to view your site. You can focus your testing and features to these browsers, but watch that this statistic doesn't become a self-fulfilling prophecy. If you don't see many Mac users with Safari on your site, it could be that the site doesn't work properly for them. If you spend the time to fix your site, you might see an increase in use.
Checking into your log files periodically may be an eye-opener. You may find uncommon browsers used by more people than you’d expect, page requests for old content or high traffic numbers from a country where your products aren't sold. The Internet is wide and diverse. Design and test your site to take advantage of that.

