TECH TALK: Constructing the Memex: Googles Domination

Google has barely spent any money on advertising. It has focused on search and providing the best results fast and free of clutter. It is a rare breed of companies that has put technology above everything else. It launched when a category was seemingly stagnant. Wired (October 2001) takes up the story:

Everyone loves Google, and therein lies its dilemma. The phenomenally popular search engine – it now performs more than 100 million searches a day – achieved much of its early success by being resolutely uncommercial. As other search engines were selling banner ads and turning into portals to make a buck off what had become a commodity service, Google just did search. Its stripped-down interface (only three elements: a text-entry box, a Search button, and an “I’m Feeling Lucky” link that takes you straight to the top-ranking result) trades looks for speed. And it does search brilliantly, using a unique technique that ranks pages by how many other pages link to them.

To understand Googles success, it is important to first understand its PageRank technology. Its site has an explanation:

PageRank relies on the uniquely democratic nature of the web by using its vast link structure as an indicator of an individual page’s value. In essence, Google interprets a link from page A to page B as a vote, by page A, for page B. But, Google looks at more than the sheer volume of votes, or links a page receives; it also analyzes the page that casts the vote. Votes cast by pages that are themselves “important” weigh more heavily and help to make other pages “important.”

Important, high-quality sites receive a higher PageRank, which Google remembers each time it conducts a search. Of course, important pages mean nothing to you if they don’t match your query. So, Google combines PageRank with sophisticated text-matching techniques to find pages that are both important and relevant to your search. Google goes far beyond the number of times a term appears on a page and examines all aspects of the page’s content (and the content of the pages linking to it) to determine if it’s a good match for your query.

Adds the New York Times:

Google’s rise initially flowed from a single software innovation: a formula to retrieve pages ordered by their relevance to a Web surfer’s request.

The basic idea, known as “link analysis,” was not new. But in 1996, Sergey Brin and Larry Page, then graduate students in computer science at Stanford, began applying it to the global links that connect Web pages. Their idea was to exploit existing human intelligence by tracking the popularity of billions of different Web pages. Two years later, the two men would found Google.

Applied to the explosively growing thicket of electronic pointers that make up the World Wide Web, the approach simultaneously being explored at an IBM research laboratory in San Jose created a technical breakthrough.

Google now employs 800 people, yet it handles 200 million searches of the Web each day, a staggering one-third of the estimated daily total. To keep up with that torrent, Google has essentially built a home-brew supercomputer that is distributed across eight data centers.

The result of Googles innovative algorithms was that the most relevant pages, as perceived by analyzing the link structure of the web, started showing up on top when we did searches. We suddenly found sense in search even though the results still showed a huge number of matching pages, more often than not the information we were looking for was more likely to be found in the first few pages listed in the results of the Google search. This relevance focus (along with the simplicity of its design) has helped Google occupy centrestage in our lives. Search has come back into fashion. Perhaps too much so.

Tomorrow: Googles Domination (continued)


TECH TALK Constructing the Memex+T

Published by

Rajesh Jain

An Entrepreneur based in Mumbai, India.