Well, like a dork, I didn’t actually realise that Yahoo! News already offered up the whole RSS feeds for news keywords thing when I started. However, the fun thing of course with the API that Yahoo! are offering is that it doesn’t take a whole lot of time to create another script that dishes out the same thing in another ‘rival’ syndication format:
Atom enabled Yahoo! web feeds examples
The syntax of the URL is almost identical to that of the RSS feed http://undercrank.com/yahoo/atom/news/«keywords» (the equivalent RSS version would be http://undercrank.com/yahoo/rss/news/«keywords». This is a slight change to the previous syntax posted, but the old format is still perfectly valid and won’t break for those already using it.
As before, the feed validates nicely has good friendly HTTP caching facilities built in, as as a special bonus it conforms to the Tag URI recommendations pretty well too.
Use is made of the extended Author tags for each article, and support for multiple languages for each entry is available – although it appears that Yahoo! returns “en” as the language for everything, even, if it’s not.
Finally, I’ll post up the source once I’m convinced I’m doing everything the ‘right’ way.
Excited by the prospect of a reasonably usable search API being made available by Yahoo!, I’ve put together a small project that will create an dynamic RSS 2.0 feed based on the Yahoo! News Search hooks.
The feed validates nicely, and also deals well with the tests created by Unpolluted; it has GZip handling, HTTP Expires, Last-Modified, ETag and If-None-Match support built-in.
The usage is very simple – point your aggregator of choice at http://undercrank.com/yahoo/rss/news/KEYWORD, replacing KEYWORD with your appropriate search term. Some examples are given below:
I’ll release the source when I’ve tidied it up a bit so you can host it on your own boxes, but feel free to try it here for the time being.
Further to my earlier (and popular – thanks Shaun) post about Despamming Shortstat, I’ve made a small update to the code that has a few improvements on the original:
- It now only looks for entries added to the MT-Blacklist database made in the past 72 hours (you can change it if you want). The comment spam blacklist is constantly very similar to those referer spamming, and so only checking recent additions reduces the execution time dramatically and (for me) seems to be just as efficient.
- The script chucks out a brief summary of what spam domains it’s removed.
- It also uses a couple of variables from your ShortStat configuration file, so it really should just drop in and play nicely.
- You could probably set this as a cron job now, changing the 72 hours to fit the time of your job.
It’s now a few more lines than the original (well, it’s not that many, but I was rapidly getting toward a homepage full of scripts) so you can view the code right here instead:
Whilst experimenting with the new features in PHP5, I recalled that one of my current projects had a very weak Class file that was using a CSV file to look up some data – which struck me that it would benefit greatly from being switched to use the in-built SQLite support in PHP5. In fact, backing up my theory, the SQLite site has a page entitled Situations Where SQLite Works Well
, and enthused that “Raw data can be imported from CSV files”, and of topped off the excitement with the claim that “SQLite is different from most other SQL database engines in that its primary design goal is to be simple”.
And I always like simple. But of course, it’s anything but.
Read the rest of this entry »
I’ve been using Shaun Inman’s Shortstat package for a short while now as my main source of web statistics. However, as with most other blog-related things these days, it’s fairly susceptible to the innovation
known as referer spam.
Note: This script has been updated somewhat now, but the basic story here remains the same.
Anyway, this got me thinking, and cutting a long story short it occurred to me that I already had a great blacklist of spam domains supplied by Jay Allen’s MT-Blacklist, and it shouldn’t be so hard to use this list as a basis to remove the referer spam from the Shortstat database – and so, here’s the results of about 13 minutes of investigation:
include_once("configuration.php");
include_once("functions.php");
if ($shortstat) {
SI_pconnect();
$urlpatterns = mysql_query("SELECT ext_bl_item_text FROM mt_ext_bl_item");
while ($row = mysql_fetch_array($urlpatterns, MYSQL_NUM)) {
$query = "DELETE FROM si_shortstat WHERE domain LIKE \"%$row[0]\"";
@mysql_query($query);
}
}
I’ve called this “_despam.php” and installed in the Shortstat installation directory it will use your existing database connection settings. Note – the script makes the assumption that MT-Blacklist has been set up to use the same MySQL database that Shortstat uses, but that said, I imagine that’s most setups.
It certainly could do with a few more features (actually reporting back what it’s done/doing would be a start), but the basic functionality is there and lovely shiny clean reports are the result.
Update: Tony at juju.org has taken things a step further with a Perl script that ‘de-spams’ your server log files using MT-Blacklist too.