Information Refinery

Much of what we do as information/knowledge workers is process information. We get it from multiple resources, we assimilate it, we route it to others, we translate it into different formats. In fact, in the enterprise, we can think of business processes as rules for routing information, as tracing the flow of events through various filters and actions. In essence, we have information ores which are refined by the enterprise. What we have been thinking about in BlogStreet and Digital Dashboard can now be encapsulated into the concept we will call an Information Refinery.

Whether it is news items or blog posts or enterprise events, the information refinery should be handle to handle all of them. The refinery consists of the following entities:


Just likes ores need to be extracted from mines, RSS miners collect the RSS feeds from different sources. Miners listen to sources which send the raw, unstructured information to them, or extract information through bots. They then take the information and send them to specialized adaptors for RSS extraction.


These are the interfaces to the outside world. A 1-way Input Adaptor takes events and RSS-ifies them for use by the refinery. A 1-way Output Adaptor takes an RSS feed and converts it into events suitable for use by an external application. A 2-way adaptor does both. In other words, from the point of view of the refinery, an Input Adaptor is read-only and an Output Adaptor is write-only. The 2-way adaptor is read-write.

For example, there are Blog 1-way adaptors which will take RSS feeds from weblogs (or create RSS feeds if one does not exist). We can think of News Adaptors which take a URL for the site, and make an RSS feed comprising the headlines. A Mail Adaptor can read from an email message and convert to RSS. A File adaptor can take information about the file and create an RSS feed about it (extracting perhaps the first few sentences from the file as the description). We can think of a Tally (or Quicken) 1-way adaptor which extracts information from the application and creates RSS events (using financial reporting standards) which can be fed into the information refinery.

As of now, some news and blog sites put out RSS feeds. But most enterprise applications are not publishing RSS. This is what we will need to develop. Thus, the Adaptors will need to know (a) the I/O formats of the proprietary applications, and (b) the vocabularies that the segment uses. For starters, we should think of 1-way (read-only) adaptors which enable us to aggregate the raw material (information ores) from multiple disparate sources.

RSS Spoolers

They take the RSS feed from the adaptors. They take subscription requests for the feeds through Agents. An agent is instantiated for every subscription request. Agents have rules to decide what to do. For example, Rams agent can decide that he wants the and WSJ RSS feeds in full, while he wants all other entries with the keyword XML in them. The agents look at each of the incoming feeds and take the appropriate action. They route the RSS feeds/entries to other RSS spoolers or RSS Aggregators.

RSS Aggregators

There is one RSS Aggregator per subscriber/entity which is the end consumer of RSS feeds. Think of these as comprising one set of end points (the other being the information sources). RSS Aggregators have an RSS Store (database), for archiving older entries. They can publish the aggregate feed to an RSS Viewer. Or, then can send the RSS entries to Processors.


They embed business logic/ rules as to what action needs to be taken with RSS feeds. For example, they can send events to an Alerts engine to notify the user via email/IM/SMS, or create other RSS events. As such, processors can thus also be considered as information sources which put out RSS feeds for re-distribution. Processors can also work on the RSS store to create specialised applications (eg. Talent Search in an organization, Business Process Analytics to see how information flows and where the potential bottlenecks could be).

RSS Viewer

This shows all the RSS entries organized by source or time to a user, who can then decide if he wants to either delete the entry or publish it to a blog. An RSS Viewer should be capable of having multiple pages to support categorization of the incoming entries.

Blog Publisher

This uses the Blogger API to post events to a blog.

Blog Platform

This is a weblog tool like MovableType or Radio which enables management of the blog. A blog should be capable of having multiple categories (with sub-levels). The user can decide on the access rights for the categories (public, private, group). It should also be able to publish RSS feeds for specific categories.

Digital Dashboard

This is a browser with 3 tabs: one for viewing the RSS Aggregator feed, one for writing and one for viewing a users own blog. It is the unified read-write interface.

Putting It All Together

What the information refinery creates is a peer-to-peer architecture of information sources and RSS routers, filters and processors. Initially, we should look at starting at the edges this means, that let the enterprise applications do their own thing. What we want to extract from there is an RSS feed which can be aggregated with other feeds and put forward before the user. We should not focus initially on trying to create the information. As an analogy, did the aggregation of news content through using templates and a single page as the viewer. It did not try to create the content. Similarly, we need to create adaptors and miners for the various information sources that are there (news sites, blogs, mail, enterprise applications) to aggregate them together.

One of the first applications we can consider for application within the enterprise is the following: create a Client Information System, with a weblog which has one page per client to aggregate financial, technical, marketing interactions, support and external news together.

– Extract accounting events from Tally (cheque deposited, payment made, etc.)
– As marketing and support people interact with a client, they write the conversations or forward the mails to the client blog page. If they need to write, they should write in their own space, and then publish to the client blog.
– Invoices sent should be linked as files from the blog, with a brief summary of the contents. Invoices are sent by the marketing department.
– Scan various newspapers and magazine RSS feeds to search for news relevant to the client.

This way, there is a (reverse) chronological history of all interactions with the client in a single page, across the diverse groups in the company. Today, this resides in different mail folders of people, and different applications and spreadsheets. This is one example of how the Information Refinery can streamline the raw data ores to create an integrated emergent system, where the whole is much greater than the sum of the parts.

There are, perhaps, many knowledge management systems and corporate portal applications which could do something similar. The difference in the approach we have taken here is that this can be built in a very general-purpose manner (and even customizable easily at the interface points like Processors), it encourages users to continue what they are doing with the only addition of writing they will do this because they will derive significant value from the system, since it creates a positive feedback with the information flow, it can be put together quite quickly, and it leverages the existing enterprise applications.

The key is to first start at the edges and create the unified viewing interface (the digital dashboard, as it were) a read-only interface, but with information aggregated from multiple sources. In general, in organizations, there are 10x more readers than writers (i.e, one person may update the accounts information, but there are likely to be 10 people relying on that information for analytics and decision-support). The next step is to enable two-way communication into the applications, so the Digital Dashboard can also become a writable area which interfaces to applications. This is as a precursor to then replacing the expensive non-integrated enterprise applications with integrated web services- compatible components.

Whats missing in here is a more detailed discussion on how business processes can be reflected and managed through this information refinery, and how we can use business process standards as embedded in RosettaNet.

This approach is obviously not going to work for every organization. The main target are the SMEs, who today use very few enterprise applications. They can now, over time, be given a full suite of applications through a common interface. Just as the browser created the front-end which became a window to the web, with HTML and HTTP taking care of the backend, similarly, we want the RSS-driven Digital Dashboard to become the enterprise information workers front-end with the back-end being composed initially of adaptors to existing enterprise applications and later of newly minted web services compliant software components which embed the appropriate business logic and work together like Lego blocks.

Taken together with the Thin Client-Thick Server project which creates the IT infrastructure providing connected computers for all, the Information Refinery and its constituents provide the framework to build the Real-Time Enterprise for SMEs. The price point is what Ive talked about in the past: all the hardware, software, support and training thats needed for no more than USD 20 per person month. This, according to me, is the next real opportunity, and the vision that we want to work towards in Emergic.

Published by

Rajesh Jain

An Entrepreneur based in Mumbai, India.