Similar Entries

I just added a ‘Similar Entries’ section to each blog post. This is highly experimental. It finds 5 blog posts which are similar or relevant to the current blog post. It does this by automatically finding key terms in the blog post using Yahoo!’s Web Services. These are then stored as “tags” or taxonomies in the database. The blog then finds other blog posts with the same keywords and orders them by relevancy.

As it’s experimental, I’ve left quite a lot of debug/test information in – the keywords generated and the “score” of relevant entries. It’s working quite well in some cases; for example on the blog post “Opera to become free?“, it links to a later post when Opera does become free, Opera 8.0 upgrading and some web statistics. Some of the keywords which Yahoo! returns don’t make much sense – on the “Opera to become free?” article it gives key terms such as tries and competes. The automatic key term extraction implementation probably needs some work – perhaps I need to make it only select the most important 5 or so terms to store as labels/tags for that object.

The key terms are generated when the blog post is created. For the older blog posts, I wrote a PHP script to generate them.

/* Load configuration file */
require (“path-to-config-file.php”);

/* Load required core libraries */
define (“IN_GENE”, TRUE);
require (“loader.php”);

$blog = Gene::getObject(1);
$childs = $blog->getChilds();

foreach ($childs as $i) {
    $object = Gene::getObject($i[‘object_id’]);
    $taxonomy =& $object->getTaxonomy(1);
    $taxonomy->add($taxonomy->findTerms());
    echo ‘Done ‘.$i[‘object_id’].’\n’;
}

It’s quite a nice script at 18 lines and shows how easy it is to integrate your scripts with Geneone.

It’d be interesting to see how scripts such as vBulletin find similar topics. If anyone has any knowledge, please share it! Greatfully appreciated! 

Leave a Reply

Your email address will not be published. Required fields are marked *