Distributed Computing Economics

Jim Gray of Microsoft writes in an essay:

Computing economics are changing. Today there is rough price parity between (1) one database access, (2) ten bytes of network traffic, (3) 100,000 instructions, (4) 10 bytes of disk storage, and (5) a megabyte of disk bandwidth. This has implications for how one structures Internet-scale distributed computing: one puts computing as close to the data as possible in order to avoid expensive network traffic.

Put the computation near the data. The recurrent theme of this analysis is that “On Demand” computing is only economical for very CPU-intensive (100,000 instructions per byte or a CPU-day per gigabyte of network traffic) applications.

How do you combine data from multiple sites? Many applications need to integrate data from multiple sites into a combined answer. The arguments above suggest that one should push as much of the processing to the data sources as possible in order to filter the data early (database query optimizers call this “pushing predicates down the query tree”). There are many techniques for doing this, but fundamentally it dovetails with the notion that each data source is a Web service with a high-level object-oriented interface.

Also note the extremely readable format, with the use of bold statements to highlight the key points.

On a related note, here is an interview with Jim Gray from ACM Queue. The focus is on storage.

Published by

Rajesh Jain

An Entrepreneur based in Mumbai, India.