January 16, 2005

Indian Express Mention

Join the Dots is a story in the Indian Express about Indian bloggers. It has a small excerpt about me:

[A self-contained, indulgent space] is the last thing one can call emergic.org, entrepreneur Rajesh Jains two-year-old web log on emerging tech, enterprises and markets.

Jain, founder of Indiaworld, the countrys first portal that was sold to Sify in 1999 for $115 million, prefers terse e-mail replies supplemented by appropriate links than one-on-one meetings. He says everything one needs to know about him or what he has to say about technology is there on the blog, real time. In fact, the blog is Jain, in HTML.

Today, I can imagine being without an email or a cellphone for a day, but not without blogging, says Jain, who blogs every morning for 30-40 minutes, with one column, and about 4 to 5 links with abstracts to other articles/blog posts.

The blog reflects his latest thinking, built on the minds of many others. The comments that I receive from many of the readers (and other bloggers) help in refining and getting the best from a community smarter than any single individual.

Well, regarding the “terse email replies supplanting one-to-one meeting” — Murali Menon, one of the two writers, caught me on an exceptionally busy four-day period and so I had to decline a meeting. Anyways, I have no real penchance for photos in newspapers and magazines! Email replies and links to things I have written about why I blog can work just as fine.

Overall, a nice story — hopefully, it will get more Indians to start blogging. And more importantly, sustain it over a period of time.

Bosworth on Databases

Adam Bosworth asks where all the good databases have gone:

Users of databases tend to ask for three very simple things:

1) Dynamic schema so that as the business model/description of goods or services changes and evolves, this evolution can be handled seamlessly in a system running 24 by 7, 365 days a year. This means that Amazon can track new things about new goods without changing the running system. It means that Federal Express can add Federal Express Ground seamlessly to their running tracking system and so on. In short, the database should handle unlimited change.

2) Dynamic partitioning of data across large dynamic numbers of machines. A lot people people track a lot of data these days. It is common to talk to customers tracking 100,000,000 items a day and having to maintain the information online for at least 180 days with 4K or more a pop and that adds (or multiplies) up to a 100 TB or so. Customers tell me that this is best served up to the 1MM users who may want it at any time by partioning the data because, in general, most of this data is highly partionable by customer or product or something. The only issue is that it needs to be dynamic so that as items are added or get “busy” the system dynamically load balances their data across the machines. In short, the database should handle unlimited scale with very low latency. It can do this because the vast majority of queries will be local to a product or a customer or something over which you can partion. It is, obviously, going to come at a cost for complex joins and predicates across entire data sets, but as it turns out, this isn’t that normative for these sorts of data bases and an be slower as long as point 3 below is handled well. And a lot of them can be solved with some giant indices that cover the datasets that are routinely scanned across customers or products.

3) Modern indexing. Google has spoiled the world. Everyone has learned that just typing in a few words should show the relevant results in a couple of hundred milliseconds. Everyone (whether an Amazon user or a customer looking up a check they wrote a month ago or a customer service rep looking up the history for someone calling in to complain) expects this. This indexing, of course, often has to include indexing through the “blobs” stored in the items such as PDF’s and Spreadsheets and Powerpoints. This is actually hard to do across all data, but much of the need is within a partioned data set (e.g. I want to and should only see my checks, not yours or my airbill status not yours) and then it should be trivial.

If the database vendors ARE solving these problems, then they aren’t doing a good job of telling the rest of us. The customers I talk to who are using the traditional databases are esentially using them as very dumb row stores and trying very hard to move all the logic and searching out into arrays of machines with in memory caches.

When Browsers Grow Up

Mitch Kapor writes:

The greater convenience of the browser has been evident for many years. Browsers work from every PC, while desktop applications do not as they have to be installed (purchased, licensed, etc.) where they are to run. I can check my mail from anywhere. I like that.

The exception to the far greater convenience of the browser is off-line usage. With no net connection, data stored in a web app is inaccessible. So, infrastructure to support local storage of data (via caching, via something fancier) as a standard affordance of web-based applications is perhaps the biggest remaining barrier to be overcome. There is no fundamental reason I am aware of it can’t be overcome either on a case-by-case basis, or better, in a more general way which would work not just for a given application, but for many of them.

So far, I’ve been describing redoing the feature set of a conventional app for the web. When an application, like Chandler, tries to break new ground in functionality or interface, matters grow considerably more complex, a subject I may take up here in the future. But for any new application project I get involved in starting, my strong predisposition is to think in terms of a web interface as primary.