Nutch for Search

News.com reports on an open-source search engine project:

The project is developing open-source software for locating documents online. But unlike major search providers, it won’t cloak its formulas for matching relevant results to visitors’ queries. Rather, it will provide an open window into its calculations with links to explanations on how it determined each result, according to lead architect Doug Cutting.

Nutch has already taken the wraps off its downloadable software for research, which is suitable for testing by other developers but likely too arcane for the average Web surfer. It is aiming to have a public site by October that will allow people to search 100 million documents to be used as a measure against indexes such as Google.

For example, a Web surfer could pull up search results from Nutch with transparency to its mathematical calculations and compare them with those from Google, which does not publicize its formula for calculating search results.

The engine is written in Java and is based on Lucene, a software library that developers can use to add search capabilities to technologies such as e-mail. Nutch builds upon Lucene, also developed in part by Cutting, and uses the technology as its intersearch library and indexing tool.

Short URL: http://emergic.org/?p=2942

The reader-friendly version of the story, Nutch for Search, is made available for your personal and non-commercial use only.