Bus. Std: Searching the Net — New Technologies

My latest column in Business Standard:

One of the defining battles in the mid-1990s was between Netscape and Microsoft over control of the desktop. Netscape threatened Microsofts Windows lock with its web browser. Microsoft fought back with a vengeance and finally won, as a marginalized Netscape was bought by AOL. Now, there is another battle thats shaping up which could be equally defining for the future of computing.

This time, the attacker is Google. Over the past few years, Google has become the search engine of choice. As its dominance has soared, so have its ambitions. Over the past couple years, Google has extended itself beyond search to other areas organically and via acquisitions. In recent times, the wheel has come a full circle with speculation rife that Google may be planning to launch its own browser.

There are two parts to the story as we see unfolding as Google, Yahoo and Microsoft, along with a host of others, work to define tomorrows interface to the information web. The two parallel threads consist of building better search engines and creating richer interfaces. The search engines are the backend to solve the information overload problem, while the interfaces are the doorways to the world of content and applications.

We will first discuss advances in search technologies. Later, we will look at how we will access this emerging world of service-based computing.

The problem of search is one of plenty. There is a lot of data on the web that needs to be converted into useful information. Search is one the solutions to the proliferation of data that has taken place with the growth of the Internet. As John Battelle of Searchblog put it recently: Search is our response to the extraordinary info-abundance in which we’re all awash.

Googles PageRank technology helped it separate the wheat from the chaff. In a recent article on Googles history, the Economist (Technology Quarterly, Sep 16, 2004) explained how the algorithm works: PageRank works by analysing the structure of the web itself. Each of its billions of pages can link to other pages, and can also, in turn, be linked to. [Googles founders] Mr Brin and Mr Page reasoned that if a page was linked to many other pages, it was likely to be important. Furthermore, if the pages that linked to a page were important, then that page was even more likely to be important. There is, of course, an inherent circularity to this formulathe importance of one page depends on the importance of pages that link to it, the importance of which depends in turn on the importance of pages that link to them. But using some mathematical tricks, this circularity can be resolved, and each page can be given a score that reflects its importance.

The search of today can be considered in the C-prompt era, and needs an upgrade. So, what will be the Windows of the search era? In an interview with ACM Ubiquity, Ramesh Jain, professor of computer science at Georgia Institute of Technology, explains what needs to be done: Current search engines like Google do not give me a steering wheel for searching the Internet. The search engines get faster and faster, but they’re not giving me any control mechanism. The only control mechanism, which is also a stateless control mechanism, asks the searcher to put in keywords, and if I put in keywords I get this huge monstrous list. I have no idea how to refine this list. The only way is to come up with a completely new keyword list. I also don’t know what to do with the 8 million results that Google threw at me. So when I am trying to come up with those keywords, I don’t know really where I am. That means I cannot control that list very easily because I don’t have a holistic picture of that list. That’s very important. When I get these results, how do I get some kind of holistic representation of what these results are, how they are distributed among different dimensionsTwo common dimensions that I find very useful in many general applications are time and space. If I can be shown how the items are distributed in time and space, I can start controlling what I want to see over this time period or what I want to see in that space.

One glimpse of search innovation comes from Amazon with its A9 search engine, which is built around Googles search results, and also integrates Amazons own book search results. John Battelle explained A9s approach in a column for Business2.0: A9 has broken search into its two most basic parts. Recovery is everywhere you’ve been before (and might want to go again); discovery is all that you may wish to find but have yet to encounter. A9 attacks recovery through its original Search History feature and its integrated toolbar, which tracks every site you visit. But new to this version of the site is a feature A9 calls Discover, which finds sites you might be interested in based on your click stream and — here’s the neat part — the click streams of othersA9 is more of a Web information management interface, with search as its principal navigational tool. [It is] betting that over time, Web users will come to recognize, then demand, that their search service not only find sites based on queries but also remember where they have been and what they have clicked on.

A few years ago, it seemed the search game was over. Results were inaccurate and portals were the thing to do. Googles cutting-edge technology of linking resurrected an industry. Yet, the innovation in search is far from over. The game has just begun. In the next column, we will look at some key ideas which will define tomorrows search.

Published by

Rajesh Jain

An Entrepreneur based in Mumbai, India.