Faster PageRank Calculations

PageRank is at the heart of Google’s calculations. Now, Stanford University researchers have developed a technique to speed up the calcuations.

To speed up PageRank, the Stanford team developed a trio of techniques in numerical linear algebra. First, in the WWW2003 paper, they describe so-called “extrapolation” methods, which make some assumptions about the Web’s link structure that aren’t true, but permit a quick and easy computation of PageRank. Because the assumptions aren’t true, the PageRank isn’t exactly correct, but it’s close and can be refined using the original PageRank algorithm. The Stanford researchers have shown that their extrapolation techniques can speed up PageRank by 50 percent in realistic conditions and by up to 300 percent under less realistic conditions.

A second paper describes an enhancement, called “BlockRank,” which relies on a feature of the Web’s link structure–a feature that the Stanford team is among the first to investigate and exploit. Namely, they show that approximately 80 percent of the pages on any given Web site point to other pages on the same site. As a result, they can compute many single-site PageRanks, glue them together in an appropriate manner and use that as a starting point for the original PageRank algorithm. With this technique, they can realistically speed up the PageRank computation by 300 percent.

Finally, the team notes in a third paper that the rankings for some pages are calculated early in the PageRank process, while the rankings of many highly rated pages take much longer to compute. In a method called “Adaptive PageRank,” they eliminate redundant computations associated with those pages whose PageRanks finish early. This speeds up the PageRank computation by up to 50 percent.

“Further speed-ups are possible when we use all these methods,” Kamvar said. “Our preliminary experiments show that combining the methods will make the computation of PageRank up to a factor of five faster. However, there are still several issues to be solved. We’re closer to a topic-based PageRank than to a personalized ranking.”

Also, thanks to Rahul Dave, for having told me about these ideas a little while earlier.

This could be interesting for BlogStreet, as we look at creating searches within a neighbourhood. It is something I talk about in the Memex series.

New Memes

We all need new things to keep talking about. Think of these as Memes. Now, the hottest meme going around blogsphere is social software. These memes get reinforced because we all seem to link to others talking about them, and then chip in with our opinions. The point is, and I don’t have a scientific study to prove it, is that there is roughly one new meme every month. This sustains life and chatter. Not that it is bad. But then, one has to separate the hype and what is real.

Like now all of a sudden almost all the business publications have been writing about Barry Diller emerging as the smartest Net investor. A few months ago, it was all about panning AOL Time Warner. And we in the blog world, follow the memes-of-the-moment. It would be good to do an analysis to see what is hot now, what was hot sometime ago, and then come back and see where these ideas are a few months later. Some become ingrained and thus invisible, while others simply die away. But it all makes for good discussion and thought.

Blogs Exponential

Jarrett House North: “We could be on the cusp of an exponential explosion in weblog activity, driven by the virtuous cycle of blogging: publish – subscribe – read – comment – publish.” There are a couple of charts to reinforce Jarrett’s contention that “there really are some reinforcing loops driving the growth of the blogosphere.”

Andrew Grumet has an insightful article on the wider perspective on weblogs, stating “we seem to be at a tipping point with weblogs.”

Our New World

Two opinions on the world that is emerging around us.

Ray Ozzie: “What’s incredibly exciting to me is that a confluence of factors e.g. ubiquitous computing, networking, web and RAD technologies, the state of the job market – in essence, loosely coupled systems and loosely coupled minds – have created what amounts to a petri dish for experimentation in systems for social network formation, management and interpersonal interaction. An exciting time to be exploring what may happen to social structures, to organizations and to society when the friction between our minds can be reduced to zero … to the point where we can truly have superconductive relationships.”

Kevin Werbach: “As distribution has gotten even cheaper, the same trends have allowed Japanese cultural artifaces such as Pokemon to dominate America. The Net gives local and independent content creators the ability to compete against the domainant corporate media, not by building walls but by leveling the playing field. And just around the corner is the greater leveler of all: ubiquitous unlicensed wireless communications.”

The one thing that is happening is that the developments of email. IM, blogs, SMS, social software is allowing for non-linear relationships – both in personal lives as well as in business. This translates into a 10x increase in the transactions that we are now doing daily as compared to a decade ago. This is the opportunity for software – how to help us manage this exponential increase in what we individually have to track and manage. Think of it as Moore’s Law applied to our personal lives.

Ideas and Execution

Ideas are aplenty in our lives, the question is how many do we actually execute on. I have flet this often in my life – the ideas get far ahead of our capability to make them reality. Thinking, dreaming up new worlds is perhaps the easier part, executing on these ideas and doing so as the right time is the challenge.

I have been feeling this as I look back over the past couple of years and the many ideas that I’ve had (and written about on the Tech Talk columns and the blog). At times, I find myself going back to something I had thought of a long time ago. Maybe then, the time for execution was not right. Something was missing. It is hard to say if now is the right time, but that’s a gut feel one has to rely on.

One such idea I am contemplating is the Linux Desktop. We have had limited success so far with our thin client-thick server solution. I am wondering if I didn’t make a mistake by not going ahead with an innovative Linux desktop built around the dashboard and RSS aggregator. Am thinking about it again. This time, there is a wider context to the idea.

I should have made a bigger bet on Linux and related services. I remember thinking many years ago about setting up a Linux Development Centre in India. Support is one of the biggest constraints in the adoption of Linux, and we could have addressed that problem by offering it from India.

There are many other examples. Thinking is easy – its only our imagination that needs to be exercised. Execution is the hard part – it requires us to do a detailed plan and have faith that what we are doing is right. At times, that’s the leap we don’t make, and the idea slowly slides away. The one nice thing about a blog is that at least one can read about all the ideas one had and didn’t implement!

Continue reading

TECH TALK: Constructing the Memex: Memex Objectives (Part 2)

Learning and Recommending: The Memex needs to learn from all that we do the types of searches, the links we click on. This learning can make it more efficient in its recommendations. Today, we are seeing Bayesian analysis being used to detect and filter spams from our inbox. We also see recommendations of books at Amazon based on our prior history. This needs to apply much more to the information that we access.

Making Connections: Linking us to people, ideas and information is one of the most important aspects of the Memex. As the sources of information and its quantum increases, we will increasingly rely on experts specialists whom we trust to make the right judgments. The Memex will help in identifying these experts and connecting us to their ideas. Think of these as shortcuts that we are building in the information network.

Alerting: This makes the difference between Push and Pull. Today, we are used to pulling in information from all kinds of sources. What would be good is to have a system pushing relevant information our way, and alerting us to items of relevance for us. The last-mile to the user has been bridged with always-on wireless devices like cellphones.

Personalising: The Memex needs to take into account our context, and thus provide a custom view of the information space. In its efforts to maintain consistency, Google has forsaken the individualised view. By being able to remember the trails one took and the information gathered, it should be able to create distinctive views of the information space.

Visualising: The Memex needs to use the new developments in presentation, especially in visualization to present richer views of the information space. It needs to, like video games, provide an integrated query-and-response space.

In a sense, the Memex is more about assimilation than aggregation, more value-added integration than scanning silos, more amplification than just presentation. It needs to work silently in the background, rather than making us change dramatically the way we do our normal activities. (Of course, some change in the way we interact with our information sources will be inevitable.) It should attempt to augment, not try and replace, our memory. It should be able to widen the information net that we are able to access, and yet, specialise it to just what we need.

It may appear that what is being attempted is the Holy Grail of Information (and Knowledge) Management. It may seem like a Mission Impossible. Far from it! As we shall see, the tools and technologies to build the Memex are now at hand. The interesting thing is that, as individuals, if we do our information-related activities just a little differently, we can be active participants in an emergent system which will help build out our very own Memex.

Tomorrow: Building Blocks: Blogs

Continue reading