Emergic: Rajesh Jain's Blog

Emergic: Rajesh Jain's Blog header image 2

BlogStreet: Quarter Update

July 1st, 2002 · No Comments

When we began, we wanted to launch a blog directory and search engine in a month (by May). We started classifying blogs by hand and did about 200, before we realised that this was not the right approach. Blogs are not as easy to classify as web pages. Besides the manual approach would not take us too far. So, then we stepped back and thought through what we wanted to do.

We still want to make a Blog Directory, but this time around the focus is on (a) the hubs (b) identifying blog clusters (c) doing the categorisation automatically. The two building blocks for this are the blogroll (with the blog links) and the blog posts. So, we are focusing first on doing the blog neighbourhood analysis using the blogrolls and links from the top page of a blog. We have built a system using proglets which identifies blogrolls reasonably accurately.

In this quarter, the plan is to do the following:
– Given a blog, identify its neighbourhood, using what we have called the “Commoner” method: take the most common blogs from all the friends blogrolls and give out a most common list of blogs, in addition to myblog friends, as related. That is if a blog appears among the highest number of times in all friendblog’s blogrolls then it is treated as related.
– Give a rank to each of the blogs, and thus be able to identify the top 100 blogs
– Provide a keyword-based Search engine on the top pages of all the blogs we bot
– Link to a blog’s RSS feed (so it can be sent as an input to an RSS Aggregator)
– Automatically classify blogs by identifying the “hub blogs”
– Look within blogs to identify a Blog Post
– Bot the archives so we can build a search engine with the granularity of a Blog Post (ie, if one searches for a keyword, rather than returning just the page on which it appears as search engines would currently do, it can identify the actual blog post)
– Later: build a live map of the Blog Network (one the lines of what Barabazi discusses in his book “Linked”)

The “hidden agenda” is to be able to leverage the work being done on the external world of blogs to organisations to map out relationships within enterprises as their employees start blogging. (To get them to blog is the goal of the Digital Dashboard project.)

Tags: BlogStreet

0 responses so far ↓

  • There are no comments yet...Kick things off by filling out the form below.

Leave a Comment