Data Mining Email

Phil Wolff writes:

Email has juice. Only telephones are used more.

40% of a company’s knowledge is stored in its email boxes, hidden from intranet search engines, locked away on desktops. Email is rich with:

  • social information (who is asked about what, who redistributes information to whom),
  • time signatures (sent, received, read, forwarded, printed),
  • threading and propagation clues (A sent it to B who replied while copying it C who forwarded it to…),
  • urls pointing to the web,
  • enclosures passed along, and
  • entry points, from mobile devices to robots to business software.

  • Phil also differentiates between Yahoo and Google:

    Where Yahoo sells communication, Google sells context. Where Yahoo brings integration, Google leads with relevance. Where Yahoo! lets you type up a “buddy list”, watch Google tweak your orkut social network with clues from your mailing behavior, and vice versa.

    Where Yahoo uses their toolbar to access their many services/properties, Google’s toolbar will observe your browser experiences. And that includes now sending and reading email, surfing, news watching, reading and writing weblogs, following and posting to usenet, and shopping. With email, orkut and your toolbar, they now can create a compound profile of your interests.

    Context, relevance, experience. Tough to beat.

    On a related note, O’Reilly Network has an article on mining email that resides in a Mozilla mailbox.

    Published by

    Rajesh Jain

    An Entrepreneur based in Mumbai, India.