<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Undercrank &#187; despam</title>
	<atom:link href="http://undercrank.com/posts/tagged/despam/feed/" rel="self" type="application/rss+xml" />
	<link>http://undercrank.com</link>
	<description></description>
	<lastBuildDate>Tue, 06 Jul 2010 18:05:29 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>Despamming Shortstat (Part 2)</title>
		<link>http://undercrank.com/posts/2005/02/despamming-shortstat-part-2/</link>
		<comments>http://undercrank.com/posts/2005/02/despamming-shortstat-part-2/#comments</comments>
		<pubDate>Fri, 04 Feb 2005 21:57:49 +0000</pubDate>
		<dc:creator>Mark</dc:creator>
				<category><![CDATA[Scripting]]></category>
		<category><![CDATA[blacklist]]></category>
		<category><![CDATA[despam]]></category>
		<category><![CDATA[Jay Allen]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[referer spam]]></category>
		<category><![CDATA[Shaun Inman]]></category>
		<category><![CDATA[shortstat]]></category>
		<category><![CDATA[statistcs]]></category>

		<guid isPermaLink="false">http://undercrank.com/?p=88</guid>
		<description><![CDATA[Further to my earlier post about Despamming Shortstat, I've made a small update to the code that has a few improvements on the original.
 <a href="http://undercrank.com/posts/2005/02/despamming-shortstat-part-2/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Further to my earlier (and popular &#8211; <a href="http://www.shauninman.com/mentary/past/nofollow_the_leader.php">thanks Shaun</a>) post about <a href="http://undercrank.com/posts/05/01/despamming-shortstat">Despamming Shortstat</a>, I&#8217;ve made a small update to the code that has a few improvements on the original:</p>
<ol>
<li>It now only looks for entries added to the MT-Blacklist database made in the past 72 hours (you can change it if you want).  The comment spam blacklist is constantly very similar to those referer spamming, and so only checking recent additions reduces the execution time dramatically and (for me) seems to be just as efficient.</li>
<li>The script chucks out a brief summary of what spam domains it&#8217;s removed.</li>
<li>It also uses a couple of variables from your ShortStat configuration file, so it really should just drop in and play nicely.</li>
<li>You could probably set this as a <tt>cron</tt> job now, changing the 72 hours to fit the time of your job.</li>
</ol>
<p>It&#8217;s now a few more lines than the original (well, it&#8217;s not that many, but I was rapidly getting toward a homepage full of scripts) so you can view the code right here instead:</p>
<ul>
<li>View <a href="http://undercrank.com/code/_despam.phps">Shortstat De-spammer v0.2</a>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://undercrank.com/posts/2005/02/despamming-shortstat-part-2/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Despamming Shortstat</title>
		<link>http://undercrank.com/posts/2005/01/despamming-shortstat/</link>
		<comments>http://undercrank.com/posts/2005/01/despamming-shortstat/#comments</comments>
		<pubDate>Sat, 22 Jan 2005 06:28:45 +0000</pubDate>
		<dc:creator>Mark</dc:creator>
				<category><![CDATA[Scripting]]></category>
		<category><![CDATA[blacklist]]></category>
		<category><![CDATA[de-spamming]]></category>
		<category><![CDATA[despam]]></category>
		<category><![CDATA[Jay Allen]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[query]]></category>
		<category><![CDATA[referer spam]]></category>
		<category><![CDATA[Shaun Inman]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://undercrank.com/?p=86</guid>
		<description><![CDATA[I've been using Shaun Inman's Shortstat package for a short while now as my main source of web statistics.  However, it's fairly susceptible to the, er, 'innovation' known as referer spam - so here's some code that use's Jay Allen's MT-Blacklist master list to clean it up.
 <a href="http://undercrank.com/posts/2005/01/despamming-shortstat/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been using <cite>Shaun Inman</cite>&#8216;s <a href="http://shortstat.shauninman.com/free/">Shortstat</a> package for a short while now as my main source of web statistics.  However, as with most other blog-related things these days, it&#8217;s fairly susceptible to the <q>innovation</q> known as referer spam.</p>
<p><strong>Note:</strong> This script has been <a href="http://undercrank.com/posts/05/02/despamming-shortstat-part-2">updated somewhat</a> now, but the basic story here remains the same.</p>
<p>Anyway, this got me thinking, and cutting a long story short it occurred to me that I already had a great blacklist of spam domains supplied by <cite>Jay Allen</cite>&#8216;s <a href="http://www.jayallen.org/projects/mt-blacklist/">MT-Blacklist</a>, and it shouldn&#8217;t be so hard to use this list as a basis to remove the referer spam from the Shortstat database &#8211; and so, here&#8217;s the results of about 13 minutes of investigation:</p>
<p><code language="php"><br />
include_once("configuration.php");<br />
include_once("functions.php");<br />
if ($shortstat) {<br />
SI_pconnect();<br />
$urlpatterns = mysql_query("SELECT ext_bl_item_text FROM mt_ext_bl_item");<br />
while ($row = mysql_fetch_array($urlpatterns, MYSQL_NUM)) {<br />
$query = "DELETE FROM si_shortstat WHERE domain LIKE \"%$row[0]\"";<br />
@mysql_query($query);<br />
}<br />
}<br />
</code></p>
<p>I&#8217;ve called this &#8220;_despam.php&#8221; and installed in the Shortstat installation directory it will use your existing database connection settings.  Note &#8211; the script makes the assumption that MT-Blacklist has been set up to use the same MySQL database that Shortstat uses, but that said, I imagine that&#8217;s most setups.</p>
<p>It certainly could do with a few more features (actually reporting back what it&#8217;s done/doing would be a start), but the basic functionality is there and lovely shiny clean reports are the result.</p>
<p><strong>Update:</strong> Tony at <a href="http://www.juju.org/archives/2005/01/21/derefspam">juju.org</a> has taken things a step further with a Perl script that &#8216;de-spams&#8217; your server log files using MT-Blacklist too.</p>
]]></content:encoded>
			<wfw:commentRss>http://undercrank.com/posts/2005/01/despamming-shortstat/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
	</channel>
</rss>
