<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Undercrank &#187; statistics</title>
	<atom:link href="http://undercrank.com/posts/tagged/statistics/feed/" rel="self" type="application/rss+xml" />
	<link>http://undercrank.com</link>
	<description></description>
	<lastBuildDate>Tue, 06 Jul 2010 18:05:29 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>Despamming Shortstat</title>
		<link>http://undercrank.com/posts/2005/01/despamming-shortstat/</link>
		<comments>http://undercrank.com/posts/2005/01/despamming-shortstat/#comments</comments>
		<pubDate>Sat, 22 Jan 2005 06:28:45 +0000</pubDate>
		<dc:creator>Mark</dc:creator>
				<category><![CDATA[Scripting]]></category>
		<category><![CDATA[blacklist]]></category>
		<category><![CDATA[de-spamming]]></category>
		<category><![CDATA[despam]]></category>
		<category><![CDATA[Jay Allen]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[query]]></category>
		<category><![CDATA[referer spam]]></category>
		<category><![CDATA[Shaun Inman]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://undercrank.com/?p=86</guid>
		<description><![CDATA[I've been using Shaun Inman's Shortstat package for a short while now as my main source of web statistics.  However, it's fairly susceptible to the, er, 'innovation' known as referer spam - so here's some code that use's Jay Allen's MT-Blacklist master list to clean it up.
 <a href="http://undercrank.com/posts/2005/01/despamming-shortstat/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been using <cite>Shaun Inman</cite>&#8216;s <a href="http://shortstat.shauninman.com/free/">Shortstat</a> package for a short while now as my main source of web statistics.  However, as with most other blog-related things these days, it&#8217;s fairly susceptible to the <q>innovation</q> known as referer spam.</p>
<p><strong>Note:</strong> This script has been <a href="http://undercrank.com/posts/05/02/despamming-shortstat-part-2">updated somewhat</a> now, but the basic story here remains the same.</p>
<p>Anyway, this got me thinking, and cutting a long story short it occurred to me that I already had a great blacklist of spam domains supplied by <cite>Jay Allen</cite>&#8216;s <a href="http://www.jayallen.org/projects/mt-blacklist/">MT-Blacklist</a>, and it shouldn&#8217;t be so hard to use this list as a basis to remove the referer spam from the Shortstat database &#8211; and so, here&#8217;s the results of about 13 minutes of investigation:</p>
<p><code language="php"><br />
include_once("configuration.php");<br />
include_once("functions.php");<br />
if ($shortstat) {<br />
SI_pconnect();<br />
$urlpatterns = mysql_query("SELECT ext_bl_item_text FROM mt_ext_bl_item");<br />
while ($row = mysql_fetch_array($urlpatterns, MYSQL_NUM)) {<br />
$query = "DELETE FROM si_shortstat WHERE domain LIKE \"%$row[0]\"";<br />
@mysql_query($query);<br />
}<br />
}<br />
</code></p>
<p>I&#8217;ve called this &#8220;_despam.php&#8221; and installed in the Shortstat installation directory it will use your existing database connection settings.  Note &#8211; the script makes the assumption that MT-Blacklist has been set up to use the same MySQL database that Shortstat uses, but that said, I imagine that&#8217;s most setups.</p>
<p>It certainly could do with a few more features (actually reporting back what it&#8217;s done/doing would be a start), but the basic functionality is there and lovely shiny clean reports are the result.</p>
<p><strong>Update:</strong> Tony at <a href="http://www.juju.org/archives/2005/01/21/derefspam">juju.org</a> has taken things a step further with a Perl script that &#8216;de-spams&#8217; your server log files using MT-Blacklist too.</p>
]]></content:encoded>
			<wfw:commentRss>http://undercrank.com/posts/2005/01/despamming-shortstat/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
	</channel>
</rss>
