Jon Udell writes: “There’s been some discussion in the blog world about using a Bayesian categorizer to enable a person to discriminate along various interest/non-interest axes. I took a run at this recently and, although my experiments haven’t been wildly successful, I want to report them because I think the idea may have merit…We know that autocategorization succeeds in the narrow domain of spam filtering. Whether it can succeed more generally — for example, by helping blog authors and readers manage flows of items — is yet unclear. The raw tools are available, but until they’re well integrated into authoring and reading software, it will be hard to get a good sense of what’s possible.”
Some additional thoughts from Udell:
First, from the perspective of a blog author who already categorizes content (as many do), the question is: can effort that’s already being invested pay more dividends? An automated review of things that have been already been categorized can help you sharpen your sense of the structure you are building. A prediction about how to categorize a newly-written item can be interesting and helpful too. As I worked through the exercise, I could (at times) imagine the software to be acting like a person you’d bounce an idea off of. “I can see why you choose that category,” we can imagine it saying, “but for what it’s worth, it has a lot in common with these items in this other category.”
The second and even more speculative idea would be to create subscribable filters. Consider the set of items that I write myself, and categorize under, say, web_services. Some other set of items out there in the blogosphere, written by other folks, will tend to cluster with mine. Could we say that those other items have some affinity for “Jon’s take on Web services”? And if so, by subscribing to my text-frequency database for that category could you use it to create one view of your own inbound feeds, or to suggest ones you’re not reading?
Matt Mower follows it up with an interesting thought: “What might be interesting is if people could “share” and “subscribe to” preference maps. As a new user of the system you might not really know who is relevant on any particular topic. But imagine you worked with David Weinberger, Phil Wolff, or Dan Gillmor. If you knew them and trusted their judgement you could pick one of their preference maps as a starting point and immediately gain a usseful insight into the data as it is structured by topic. You might even switch between personalities to get more perspective!”