Undercrank

Icon

Despamming Shortstat (Part 2)

Further to my earlier (and popular – thanks Shaun) post about Despamming Shortstat, I’ve made a small update to the code that has a few improvements on the original:

  1. It now only looks for entries added to the MT-Blacklist database made in the past 72 hours (you can change it if you want). The comment spam blacklist is constantly very similar to those referer spamming, and so only checking recent additions reduces the execution time dramatically and (for me) seems to be just as efficient.
  2. The script chucks out a brief summary of what spam domains it’s removed.
  3. It also uses a couple of variables from your ShortStat configuration file, so it really should just drop in and play nicely.
  4. You could probably set this as a cron job now, changing the 72 hours to fit the time of your job.

It’s now a few more lines than the original (well, it’s not that many, but I was rapidly getting toward a homepage full of scripts) so you can view the code right here instead:

Despamming Shortstat

I’ve been using Shaun Inman’s Shortstat package for a short while now as my main source of web statistics. However, as with most other blog-related things these days, it’s fairly susceptible to the innovation known as referer spam.

Note: This script has been updated somewhat now, but the basic story here remains the same.

Anyway, this got me thinking, and cutting a long story short it occurred to me that I already had a great blacklist of spam domains supplied by Jay Allen’s MT-Blacklist, and it shouldn’t be so hard to use this list as a basis to remove the referer spam from the Shortstat database – and so, here’s the results of about 13 minutes of investigation:


include_once("configuration.php");
include_once("functions.php");
if ($shortstat) {
SI_pconnect();
$urlpatterns = mysql_query("SELECT ext_bl_item_text FROM mt_ext_bl_item");
while ($row = mysql_fetch_array($urlpatterns, MYSQL_NUM)) {
$query = "DELETE FROM si_shortstat WHERE domain LIKE \"%$row[0]\"";
@mysql_query($query);
}
}

I’ve called this “_despam.php” and installed in the Shortstat installation directory it will use your existing database connection settings. Note – the script makes the assumption that MT-Blacklist has been set up to use the same MySQL database that Shortstat uses, but that said, I imagine that’s most setups.

It certainly could do with a few more features (actually reporting back what it’s done/doing would be a start), but the basic functionality is there and lovely shiny clean reports are the result.

Update: Tony at juju.org has taken things a step further with a Perl script that ‘de-spams’ your server log files using MT-Blacklist too.

delicious

Disclaimer

The opinions expressed on Undercrank.com are personal and do not represent Skype or any other organisation I belong to.

Flickr

Mark McLaughlin posted a photo:One of the more curious sights at Macworld - it takes a line input and attaches to a helmet, sending vibrations through that so that you get music 'in your head' but still leave your ears exposed.Mark McLaughlin posted a photo:He also wishes you a Happy President's Day, too.Mark McLaughlin posted a photo:Outside the App Planet (Hall 7) at Mobile World Congress 2010Mark McLaughlin posted a photo:Stunning light fitting inside "El Principal" restaurant in Barcelona.Mark McLaughlin posted a photo:I really actually thought I'd seen it all - but the award for most nerdy number plate ever goes to...Mark McLaughlin posted a photo:Mark McLaughlin posted a photo:Mark McLaughlin posted a photo:Mark McLaughlin posted a video: