The current search engines have a low relevance when it comes to searching for India-specific content because of their inability to identify India-only websites. Since most Indian sites tend to have a .com suffix, are in English and are typically hosted on US servers, none of the standard three parameters to identify India-centric content (domain, language and hosting location) work well. As such, there is an opportunity to build an India-centric search engine provided it can have the right basis set.
To build the basis set, one approach followed would be as follows:
Identify about 10,000 India-centric URLs. These would be identified from current search engines and links from known Indian sites. They would be vetted by human editors prior to crawling. This would result in a few million pages. This process would take 5 content editors along with a couple software programmers (to write software to automatically pick URLs from specified pages) about 2 person-months to identify. [1 content editor should be able to identify/vet 40 URLs a day. Thus in a month, one person can vet about 1,000.]
Next, crawl these sites. That is the initial basis set.
From these sites, work on identifying outgoing links and incoming links. Some heuristics should be used to identify India-relevance of these URLs. These would then be submitted to the editorial team for vetting.
In parallel, inputs would be solicited from webmasters for submitting India-specific sites, which also would be vetted by the editorial team.
Our goal should be able to get to 90% coverage of the Indian sites in about 3 months after launch and 100% in about 6 months.
There should be a total of three offerings each on the web and mobile platform:A directory of the best Indian sites, organised hierarchically
A Reference Web search engine based on the sites
An Incremental Web search engine based on RSS feeds
Points to Ponder:How can we build a more scalable model for soliciting India-specific content? eg. Tagging
What additional differentiation can we get with the likes of Google and Yahoo?
How can we get local content?
What about maps?
How would search work in the context of mobiles?
How do we support local languages?
When I had created khoj.com in 1997, the focus was on building an India-specific search engine. It is now time to rethink a new khoj.com building on a lot of new search-related ideas and leveraging the community.
Interested in leading or being part of this venture? Email me at rajesh-at-netcore.co.in or fill out this feedback form with a brief profile of yourself, your thoughts on the ideas presented, and your thinking about the role that you’d like to play in the venture.
Tomorrow: Computing Grid
TECH TALK Build Business+T