Eyes for Text

Ramana Rao writes about enterprise search:

Humans have Eyes for Text. Handed a document, a person can quickly scan the document and extract all kinds of useful information. The key word is useful. Its human magic that makes that assessment work in such a broad range of conditions.

Machines have generally lacked Eyes for Text. They see the same document as a sequence of bytes, when in fact to us its really made of words organized in sentences organized in passages, not to mention all kinds of other structure related to conventions, forms, genres, and so on.

Humans are blind in the face of large numbers of documents, while machines are blind to whats in one document. Our human blindness limits our ability to, well, see patterns and anomalies across the whole and connections across elements. While machine blindness means they cant help us much. It would be a case of the blind leading the blind.

Eyes for Text also suggest steps up toward understanding that arent necessarily all the way up, say to Brains for Text. So before we solve all the scientific problems of artificial intelligence and cognitive science, we can master a useful set of primitives that certainly must lay on the path in any case.

Would we be able to understand whats in a kitchen, not to mention navigate around it, or make dinner there, if our eyes didnt pull out useful features like edges, surfaces, corners?
And the edges, corners, and surfaces within documents? They are the entities mentioned, the statements made about them, whether they state relationships, events, or facts. And the sequence of these statements tell us about the topics, authority, applicability, and so on of the text.

He adds in an article on Always-On:

In the next few years, many enterprises looking for new levels of organizational intelligence will deploy what I call ‘eyes for text’ engines. These text analysis engines will produce a new generation of search applications that more effectively leverage human skills. Not only that, but they will create an entirely new class of applications, in which collections or flows of information are analyzed in their entirety, instead of one user and one document at a time. These new applications can be called ‘discovery applications’ rather than search applications.

By now, everybody is quite familiar with search applications, which are also technically known as ‘information retrieval.’ Search can be characterized as users chasing documentswhere a user, by finding and understanding documents, is trying to fulfill a need for information required for some broader task. The push version of retrieval, often called filtering or routing, is not essentially different, it is just a switch to documents chasing users. It’s still about individual users, documents, and information needs in the context of broader tasks.

But discovery applications based on text analysis are quite different. The crucial action leading to their use is not so much about documents, instead it is about statistics over statements. To clarify this, ‘statistics’ are about either patterns or anomalies: shapes that emerge from enough stuff, or things that blink in the night. The ‘statements’ part is about the specific content and value of individual human language expressions found in documents. And ‘over’ is the bridge between generalization and particulars, and between general applicability and specific applications.

I believe that in seven years, purchases of discovery technologies will catch up with spending on traditional retrieval applications. Why? Because content mining can be applied directly to a number of organizational missions, whereas information retrieval is applied indirectly, by augmenting individuals and their knowledge.

By “augmenting individuals” I mean that search applications are mind-expanding applications, in the sense that users must do the understanding part, so their brain is used and thus expanded.

Using a ‘mind-expansion’ proposition to spread a technology works best by convincing end users one at a time that a technology makes them work more efficientlyuntil suddenly organizations have no choice except to view the technology as a cost of doing business. Many personal computing technologies started out like this, including personal computers, office software, browsers, Web servers, and e-mail systems. And on intranets, search engines are almostbut not quiteseen in this way, despite almost 10 years of mainstream Internet search.

Published by

Rajesh Jain

An Entrepreneur based in Mumbai, India.