Imagine searching through your own research notes, Google, and a set of your favorite weblogs all at the same time, the results coming back to you ranked in a meaningful order. Imagine getting relevant results to a search even when there is no keyword match, or being able to refine your search by selecting a set of good results and asking for more.
The presenters, who first met and discussed the concept at last year’s ETCon, have been working on just such a project: an open-source latent semantic search engine that lives on the desktop and lets users navigate and search their own writing – notes, articles, or weblog entries. Because it examines patterns of word use across many documents, the tool offers significantly improved search results, and can accept long natural-language search queries, including entire documents. By allowing documents to organize themselves into topic clusters, the tool also offers a macro view of the user’s data, in useful digest form.
Apart from its utility as a standalone desktop program, the prototype is designed to work as a web service, creating the potential for a distributed peer-to-peer network of individual search engines. This kind of network, which is the project’s ultimate goal, would allow users to send queries out over the Internet, decide where those queries should look, and receive collated results from a variety of different, complementary sources. Unlike other search aggregators, the ability to interleave results in a meaningful way is an organic part of the search algorithm’s design. Whether searching weblogs, research notes, article archives, or personal notes, users would have full control over what they searched, and an unprecedented ability to make their own work accessible to others.
Maciej points to a note on Semantic Indexing for more info.
This is a topic I began this week in my Tech Talk, and will be writing about for the next few weeks. We are also hoping that we can take the base infrastructure we have built in BlogStreet and use it to build a prototype of a personal Memex system.