Emergic: Rajesh Jain's Blog

Emergic: Rajesh Jain's Blog header image 2

Nutch for Search

August 19th, 2003 · No Comments

News.com reports on an open-source search engine project:

The project is developing open-source software for locating documents online. But unlike major search providers, it won’t cloak its formulas for matching relevant results to visitors’ queries. Rather, it will provide an open window into its calculations with links to explanations on how it determined each result, according to lead architect Doug Cutting.

Nutch has already taken the wraps off its downloadable software for research, which is suitable for testing by other developers but likely too arcane for the average Web surfer. It is aiming to have a public site by October that will allow people to search 100 million documents to be used as a measure against indexes such as Google.

For example, a Web surfer could pull up search results from Nutch with transparency to its mathematical calculations and compare them with those from Google, which does not publicize its formula for calculating search results.

The engine is written in Java and is based on Lucene, a software library that developers can use to add search capabilities to technologies such as e-mail. Nutch builds upon Lucene, also developed in part by Cutting, and uses the technology as its intersearch library and indexing tool.

Tags: Software

0 responses so far ↓

  • There are no comments yet...Kick things off by filling out the form below.

Leave a Comment