January 14, 2005

Folksonomy

McGee points to a post by Clay Shirky, which according to Boing Boing is about “why crappy, cheap, user-generated, uncontrolled metadata will win out over expensive, controlled, useful, professionally generated metadata.”

Furthermore, users pollute controlled vocabularies, either because they misapply the words, or stretch them to uses the designers never imagined, or because the designers say “Oh, let’s throw in an ‘Other’ category, as a fail-safe” which then balloons so far out of control that most of what gets filed gets filed in the junk drawer. Usenet blew up in exactly this fashion, where the 7 top-level controlled categories were extended to include an 8th, the ‘alt.’ hierarchy, which exploded and came to dwarf the entire, sanctioned corpus of groups.

The cost of finding your way through 60K photos tagged ‘summer’, when you can use other latent characteristics like ‘who posted it?’ and ‘when did they post it?’, is nothing compared to the cost of trying to design a controlled vocabulary and then force users to apply it evenly and universally.

This is something the ‘well-designed metadata’ crowd has never understood — just because it’s better to have well-designed metadata along one axis does not mean that it is better along all axes, and the axis of cost, in particular, will trump any other advantage as it grows larger. And the cost of tagging large systems rigorously is crippling, so fantasies of using controlled metadata in environments like Flickr are really fantasies of users suddenly deciding to become disciples of information architecture.

Blackberry as Productivity Tool

Paul Allen writes about his experience:

I already use my Blackberry for Google phonebook searches, using the integrated web browser. (I also use Google SMS). But the thought of location aware Google Local or Yahoo Local searches on my Blackberry really gets me excited.

I’ve gone through 4 RIM Blackberries since they were introduced and absolutely love them. The thumb key pad is awesome–I can type more than 50 words per minute now. And I can take notes everywhere without pulling out a laptop and booting up. (At church I always feel a need to explain to church goers around me that I’m taking notes–not playing games.)

Until I saw a news release a few months back about Lexis data being made available to Blackberry users, I hadn’t consider that the Blackberry could be a development platform for all kinds of third party software and data services. But since I consider it the most usable of all the portable computing devices I have used (because of the thumb keypad, the scroll wheel, and the integration of cell phone services with the address book and email), I’m definitely going to investigate their application development environment further.

I have said this before and I will say it again–the single best productivity investment an entrepreneur can make is to purchase a Blackberry and stop using desktop computer time for email. I seem to get an extra 1-2 hours of productivity each day from my Blackberry.

Making Education Learner-Centric

Atanu Dey writes about moving away from a teacher-centric model and re-inventing education:

So what is a learning-centric model? First, the active agent in this is the student. The student asks the questions and the student answers the questions. The questions come first, and then the answers, which in turn lead on to more questions, and so on. The motivation is therefore in-built. Second, while the destination could be set externally (you have to master this amount of material), the path that the student takes to get there and at which pace is entirely unique for every student.

Thus the learning-centric model recognizes these two basic truths: that the universe is connected, and that every student is unique. The model makes available to the student a very rich, deep, and connected set of content which the student navigates through a process which can only be called discovery. Although the basic material is accessible to students is common, the path that a specific student takes is unique to the student. Conceptually, the content is a fully-connected network which can be traversed in a potentially infinite set of ways. One can start from any one of a very large set of nodes, and then move from one node to another till entire structure has been visited. I will go into the details of operationalizing such a model later but for now allow me to illustrate it.

On the way to school, the student sees a beautiful rainbow painted by a passing rain shower. Upon arriving at the school, he looks up “rainbow” in the online School-in-a-Box (SiaB). The system responds with an image and some text explaining what a rainbow is. That explanation refers to a small set of concepts, from internal reflection of light to the physics of optics to refractive index of various media to rain to the hydrological cycle to weather to monsoons, and so on. The student can then choose to move on to the nature of light and watch a little video of how light passing through a prism separates the various frequencies. Or, related to rain, the student could hear a poem by Tagore read by a gifted actor, and read a critique of the poem and thus move through the content at a pace that suits him and as his spirit moves him. Starting at the rainbow, the student could end up learning a number of physics modules, or metereological modules, or a few literature modules. From time to time, the student could take “challenge” tests, which examine the understanding of the student, following the browsing.

Google’s Internals

ZDNet has some interesting insights based on a talk given by Google’s vice-president of engineering, Urs Hlzle:

Google runs its systems on cheap, no-name IU and 2U servers — so cheap that Google refers to them as PCs. After all each one has a standard x86 PC processor, standard IDE hard disk, and standard PC reliability – which means it is expected to fail once in three years.

On a PC at home, that is acceptable for many people (if only because they’re used to it), but on the scale that Google works at it becomes a real issue; in a cluster of 1,000 PCs you would expect, on average, one to fail every day. “On our scale you cannot deal with this failure by hand,” said Hlzle. “We wrote our software to assume that the components will fail and we can just work around it. This software is what makes it work.

One key idea is replication. “This server that contains this shard of the Web, let’s have two, or 10,” said Hlzle. “This sound sounds expensive, but if you have a high-volume service you need that replication anyway. So you have replication and redundancy for free. If one fails you have 10 percent reduction in service so no failures so long as the load balancer works. So failure becomes and a manageable event.”

In reality, he said, Google probably has “50 copies of every server”. Google replicates servers, sets of servers and entire data centres, added Hlzle, and has not had a complete system failure since February 2000. Back then it had a single data centre, and the main switch failed, shutting the search engine down for an hour. Today the company mirrors everything across multiple independent data centres, and the fault tolerance works across sites, “so if we lose a data centre we can continue elsewhere — and it happens more often than you would think. Stuff happens and you have to deal with it.”

A new data centre can be up and running in under three days. “Our data centre now is like an iMac,” said Schulz.” You have two cables, power and data. All you need is a truck to bring the servers in and the whole burning in, operating system install and configuration is automated.”

Working around failure of cheap hardware, said Hlzle, is fairly simple. If a connection breaks it means that machine has crashed so no more queries are sent to it. If there is no response to a query then again that signals a problem, and it can cut it out of the loop.

That is redundancy taken care of, but what about scaling? The Web grows every year, as do the number of people using it, and that means more strain on Google’s servers.

Blogs and Beyond

Doc Searls writes:

I think the future of periodical publishing, and of journalism itself, will be built mostly by individual bloggers and indivdidual blogs, and by a new breed of publishers who harvest and republish (and, yes, pay for) goods from the wide open ranges where bloggers roam, and post, free. The day will come when the top print publications will be comprised of prose and pictures provided by blogs and bloggers.

The same thing will happen with television. And music. Movies too. (Although the rights-clearing mess is a huge hold-up there.)

Think of it as de-industrialization. Or de/re-industrialization. New industries rebuilt within and around the shells of the old ones. And old ones adapting, finally, to conditions that offer whole new frontiers of prosperity that only open up when they quit protecting the Old Ways of Doing Things (for example, by locking up archival “content” so only paying customers can see it).

Whatever replaces advertising (as we’ve known it) is also essential to the prosperity of these new journals. Is it just going to be whatever Google and Yahoo and Blogads do? No. It will be all that and much more. (Like, for example, a way to voluntarily pay — even a small amount, micropayment style) for subscriptons to RSS feeds, just like we voluntarily pay for public radio and TV broadcasts.

TECH TALK: The Best of 2004: Software Shifts

5. Tim OReilly on the Open-Source Paradigm Shift (May)

Tim OReillys essay captures the paradigm shift that open-source software is bringing. It has been fascinating to watch how Linux and the various open-source applications have gained momentum over the year. Open-source is at the heart of the platforms built by companies by Google and Yahoo. Without commoditised hardware and open-source software, the world would have been a very different place! The importance of software in our lives will continue to increase. In emerging markets, open-source is the enabler for a level-playing field piracy and non-consumption are not long-term options. Besides, one of the personal highlights for me was my meeting with Tim OReilly during my US visit in August.

In 1962, Thomas Kuhn published a groundbreaking book entitled The Structure of Scientific Revolutions. In it, he argued that the progress of science is not gradual but (much as we now think of biological evolution), a kind of punctuated equilibrium, with moments of epochal change. When Copernicus explained the movements of the planets by postulating that they moved around the sun rather than the earth, or when Darwin introduced his ideas about the origin of species, they were doing more than just building on past discoveries, or explaining new experimental data. A truly profound scientific breakthrough, Kuhn notes, “is seldom or never just an increment to what is already known. Its assimilation requires the reconstruction of prior theory and the re-evaluation of prior fact, an intrinsically revolutionary process that is seldom completed by a single man and never overnight.”

Kuhn referred to these revolutionary processes in science as “paradigm shifts”, a term that has now entered the language to describe any profound change in our frame of reference.

Paradigm shifts occur from time to time in business as well as in science. And as with scientific revolutions, they are often hard fought, and the ideas underlying them not widely accepted until long after they were first introduced. What’s more, they often have implications that go far beyond the insights of their creators.

My premise is that free and open source developers are in much the same position today that IBM was in 1981 when it changed the rules of the computer industry, but failed to understand the consequences of the change, allowing others to reap the benefits. Most existing proprietary software vendors are no better off, playing by the old rules while the new rules are reshaping the industry around them.

I find it useful to see open source as an expression of three deep, long-term trends: the commoditization of software, Network-enabled collaboration and Software customizability (software as a service).

Monday: Art and Artists

Continue reading TECH TALK: The Best of 2004: Software Shifts