SpamBayes and RSSBayes

InfoWorld writes about how Bayes algorithms are being used to combat spam:

Several e-mail programs, including the Mail program bundled with Mac OS X, use Bayesian techniques to enable users to train their systems to distinguish between spam and nonspam (aka ham). Experts debate how the term Bayesian is relevant to this game of classification, but the core ideas in Paul Graham’s influential 2001 paper, “A Plan for Spam,” make sense intuitively. Every message bears evidence both for and against the hypothesis that it is spam. Your disposition of every message tests both hypotheses and systematically improves the filter’s ability to separate spam from ham.

As Graham pointed out, the judgments involved are highly individual. For example, the commercial e-mail that I want to receive (or reject) will differ from the ones you want (and don’t want) according to our interests and tastes. A filter that works on behalf of a large group, such as SpamAssassin, which checks and often rewrites my infoworld.com mail, or CloudMark’s SpamNet (formerly Vipul’s Razor), which collaboratively builds a database of spam signatures, will typically agree with SpamBayes on what I call the Supreme Court definition of spam: You know it when you see it. What sets SpamBayes apart is its ability to learn, by observing your behavior, which messages you do want to see, and the ones you don’t.

SpamBayes is an open-source project, and currently available only as an Outlook plugin. Additional discussion on SpamBayes is at Jon’s weblog.

A related idea of interest is how to apply the same Bayesian ideas to build a content recommendation engine – what Matt Griffith calls RSSBayes: “My problem is information overload. I’m much more interested in seeing the same thing for RSS. Instead of blocking stuff I don’t want I want it to highlight the stuff I might want.”

Adds Les Orchard: “Using a Bayesian approach, or some other form of machine learning, as applied to my aggregator and my viewing patterns is something I’ve been wanting for awhile now…I’d like to get back to some machine learning research and build an enhanced aggregator that learns what makes me click.”

Spam+T

Published by

Rajesh Jain

An Entrepreneur based in Mumbai, India. View all posts by Rajesh Jain