Spam traffic has grown from 8 percent of Internet e-mail in 2001 to as much as 40 percent in 2002, according to Brightmail Inc., which provides filtering products for several major Internet service providers.
Spam is costly for everybody. It costs about $250 to send a million spams, but about $2,800 in lost wages, at the federal minimum wage, for those million spams to be deleted, Yerazunis estimates.
Altogether, spam costs U.S. businesses $8.9 billion and European businesses $2.5 billion annually, according to a study released this month by San Francisco-based Ferris Research.
The article says that “William Yerazunis’ presentation on his CRM114 Discriminator language was a centerpiece of the conference. His filtering technique hashes the messages, matching short phrases from the incoming text with phrases that the user previously supplied as example text, catching spam that might not exactly match standard spam text. He claims that the system has higher than 99.9 percent effectiveness; it can be downloaded for free.”
More on CRM114:
CRM114 is a system to examine incoming e-mail, system log streams, data files or other data streams, and to sort, filter, or alter the incoming files or data streams according to whatever the user desires. Criteria for categorization of data can be by satisfaction of regexes, by sparse binary polynomial matching with a Bayesian Chain Rule evaluator, or by other means. Accuracy of the SBPH/BCR classifier has been seen in excess of 99 per cent, for 1/4 megabyte of learning text. In other words, CRM114 learns, and it learns fast .
CRM114 is compatible with SpamAssassin or other spam-flagging software; it can also be pipelined in front of or behind procmail. CRM114 is also useful as a syslog or firewall log filter, to alert you to important events but ignore the ones that aren’t meaningful.
I think we should check it out. Its free for use under GPL.