Surviving a Digging and the Futility of Statistics

The Numbers

According to Mint, slash7 (all pages included) received 16,010 page views on Monday, when I released the Scriptaculous cheat sheet.

According to Webalizer, slash7 received 58,397.

And by now, 914 Diggs. I made number five on the front page for a while. I'm tickled pink to have finally graced the front page.



When Ruby Goes Rogue

My friend Davey and I spent a fair amount of time trying desperately to shore up the server's defenses against such an onslaught of traffic. Unfortunately, nothing really worked. Ruby and Apache conspired like fiends to dominate the CPUs— they ate up CPU cycles like Takeru Kobayashi at a hot dog-eating contest.

In the end, I had to resort to putting up a static HTML file describing the mayhem — no images, no CSS, just the static HTML and Mint Javascript file. We redirected the download link to a CORAL cache to offload the 70K-a-pop hit on the bandwidth.

By the evening, traffic had died down enough to restore Typo. It was a thrilling ride, but I'd have enjoyed it more if the rollercoaster had been equipped with seatbelts.

Why Ruby Goes Rogue

Now, I know that a Rails application can hold up to this kind of load, because we see it every day. But my Typo install didn't. I'm not going to point the finger at Typo because I suspect it was a combination of things, although I suspect Typo's "feature completeness" may have contributed to the problem.

For one, the server's a measly dual PIII—I don't complain, because I get free hosting with root access, provided by an old and dear friend (thanks, Davey!). But a dual PIII is hardly the height of technology; we could get 1U rack for $1200 that'd be well more than twice as fast.

Secondly, I can't seem to use page caching with Typo. Yes, I hear you heckling me back there. But when I turn on page caching, people can't leave comments. I get a bizarre error that I haven't been able to track down: an error about permission being denied. Sounds simple, right? But the file Rails allegedly can't access is a CSS stylesheet. The permissions on which are, shall we say, quite liberal.

And so we come to the point I can't avoid. Typo is a bit bloated. The bloat has made Typo run slower, and it's also made it harder to track down bugs. We all know and love Typo, but maybe it's time to stage an intervention.

Lessons Learned

Boy have I learned my lesson. You might learn from my pain, so here's what I've learned:

First, fix any errors that might cause your server to fall to its knees and beg for mercy before you submit a link to Digg.

Secondly, if your aspirations include regular Digging, you better invest in a real server instead of taking hand-outs, however tasty free lunches may be.

Thirdly, definitely suck up to your friend who helps you survive a Digging with all sorts of arcane Apache wrangling. He deserves it.

Fourthly... don't use Apache. I'm going to see if I can't jury rig Lighttpd up before I try any such thing again.

Fifthly. It may be time to roll your (my) own. It's not as if there isn't proof that you develop a reasonably functional blog in 15 minutes, or somethin'.

The Saga Continues...

The whole thing just keeps on going—the Scriptaculous cheat sheet post has gotten 23,476 page views according to Mint; Webalizer says 26,475 (a closer comparison, this time). Mint doesn't track files, but Webalizer says the cheat sheet itself was downloaded 11,373 times—and I have no way of tracking the downloads from the various mirrors that popped up, nor the HTTP redirection I had to set up to save the server from (more) spontaneous combustion.

All in all, I did 1.9GB of traffic in one day. Yesterday, I did 2GB of traffic.

The Futility of Statistics

But let's compare those numbers again, shall we?

MintWebalizer
Page Views (Monday)16,01058,397
Page Views (C.S. post)23,47626,475

Not a whole lot of synchronicity going on here.

Webalizer works like a typical 1990's-era traffic analyzer: it parses log files. It was also obviously designed by nerds. It's butt ugly and, frankly, it's clear nobody gave a moment's thought to good information design. If you want the wham, bam, thank you ma'am of web stats, then maybe Webalizer's for you.

Mint, on the other hand, is much more nuanced. For one, it works via a Javascript include file (and PHP). The Javascript fetches all sorts of information about the user's browser, and lets you see the information in a much more useful manner, indexed by page title instead of just URL, and letting you see which referrers went to what file on a limited basis. But it's got downfalls in its UI, too, in my opinion.

But Mint's main failing is that what makes it good also makes it bad: the Javascript. The file must be included by the user's browser to even record the hit so RSS feeds and file downloads are out. And, in non-ideal circumstances—when your server is begging for its mommy, say—server load may be so severe that the file isn't loaded. And of course, some people don't run with Javascript on, but that seems to be a small percentage of the population.

Look, I Can Do Basic Math!

Why the difference?

Well, there's the load problem that I described above. There's also the questionable nature of Webalizer's information. Like many old-school stats packages, it has some funny accounting. Webalizer does track hits separately from page views, but what does it consider a page?

I think it might be counting CSS files or Javascript "pages," seeing as it reports nearly four times as many page views (all pages) as Mint does for the Day of Reckoning—although when you look at the specific page URL in question the difference is only 3,000 or 12%. This might be explained if Webalizer is tracking non-HTML responses.

Another option is that the 12% stems from the time when the server was too overloaded to send all the little files associated with the page, like Mint's Javascript file. But that doesn't explain the quadrupling of overall page views for the day.

Maybe Mint was just letting me down. I don't know. Can't know, really. Those of us who are statistics mongers by nature, well, we just have to learn to live with the uncertainty. It's kind of zen, really.

As An Aside...

If you're out there looking for a good web app idea to execute in Ruby, allow me to suggest a log file parser that doesn't suck. You can't get more reliable than parsing log files, but the fact is every package I've seen that does it (including the $800-a-head Urchin) just plain bites it. That's why Mint ($30) is taking the world by storm, even though it's PHP, and relies on Javascript, and certainly has failings of its own.

posted in: metablog, rails    |     22 comments