My latest column from Business Standard:
[Part 1 of the “Constructing the Memex” series]
In the past, the information available to us was limited by what we could remember, file on our own, ask others in a close circle, or access in a library. The Internet changed it all by making available a vast library of inter-linked digital information. Finding the information of value to us when we need it is still a challenge. In the beginning, we had Yahoos editorial selection of the best sites on the Web. Then, its editors updated site listings via incoming requests for additions as the web expanded. Next came the first generation of search engines which moved beyond the directory approach and crawled millions of pages, indexing the text on these pages.
Google, representing the second generation of search engines, used the information about links between pages as part of its PageRank technology, to come out with a superior way of searching. And yet, the problem of finding the right information in billions of web pages remains. Whether it is information in our personal or work life, there seems to be no end to the massive amount of information that we have to sift through. It is time to look at a new approach.
Imagine if we could bridge directories and search engines, making them much more customised based on our likes and the trails that we leave as we surf the Internet, and also taking into account all that we write in emails, blogs and elsewhere. This system would use our memory and knowledge as the starting point. We would start by outlining our interest areas – the topics that form the knowledge-sphere of our lives. This is akin to the directory of topics only much more personally relevant to us. For example, in my case the main categories of this list would be something like this: Affordable Computing, ICT for Development, Emerging Markets, Enterprise Software, Information Management, Small and Medium Enterprises, New Technologies and India.
Each topic needs to be drilled down further. What is needed is a hierarchy of topics, which helps in further defining our interests in greater detail. For example, my outline for Affordable Computing could look like this: Hardware (Thin Clients, Refurbished PCs, Set-Top Boxes, PDAs), Software (Terminal Services, Linux, Applications, Language Computing), and Communications (Ethernet, Wireless, Satellite, Broadband).
This hierarchy of topics serves as the basis for our interests. It gives a unique lens and context to the information that we browse on the Web, write in emails and receive as attachments. These topics will evolve as our interests change and as we come across experts who may have done a better job in building out a certain part of the information ecosystem.
This is an evolving information base built not by a centralised organization, but in a distributed manner by each of us. Each of us would have a microcosm of the information space, created and updated continuously by what we did. It would ensure that our ideas would have a context, that we would never forget something, and that we could leverage on similar work done by millions of others like us. This is the real two-way web linking not just documents, but people, ideas and information.
In the previous column, we discussed Vannevar Bushs vision of a memex (memory extender), which envisioned just such a machine that would allow us to deal with information overload by leveraging aspects of the human mind and its capability to deal with associations. The platform exists today to construct the memex built around millions of personal directories and welogs.
The weblog becomes a personal journal which allows us to store links and abstracts from articles we find of interest on the web, along with our own ruminations. A personal directory provides a table of contents, a glimpse into the way our mind catalogues information. Two existing XML-based standards would help share directories and syndicate the blog posts that we create OPML (Outline Processor Markup Language) and RSS (Rich Site Summary).
In a sense, the memex is more about assimilation than aggregation, more value-added integration than scanning silos, more amplification than just presentation. The building blocks already exist in the form of the millions of blogs created by people like us. What is missing is the outline, a map of each of our minds. Once we start creating this in a distributed manner, we will see the emergent effects of a whole which is much bigger than the sum of the parts.
In the past few years, various technologies like weblogs, RSS, OPML, web services, Googles API, visualisation tools and social software, combined with a deeper understanding of the science of networks have laid the foundation for leveraging individual intellect to put together a information framework that is far beyond the capacity of any person or organisation. The time for constructing the Memex has now come.