Emergic: Rajesh Jain's Blog

Emergic: Rajesh Jain's Blog header image 2

Search: Today and Tomorrow

June 8th, 2004 · No Comments

ResourceShelf [1 2] has an interview with Gary Flake, Head of Yahoo Research Labs. Exceprts:

RS: What’s wrong with web search today?

GF: It’s easier for me to point to what web search should be and then highlight the differences. If web search were perfect, then it would produce an answer to every query that would be as good — or better — than if the smartest people in the world had as much time, data, and contextual information (about the user) required to fulfill the query; and it would do all of this in a split second. In other words, the search engine would be an artificial intelligence (AI) so smart that if a correct answer could be found in theory with close to infinite resources, then it would find it. If a correct answer did not exist, then the search engine would give you the next best thing: an approximation, or perhaps even an explanation as to why your query has no perfect result. (And by the way, if we realized all of the above within my lifetime, I would consider myself lucky. That should give you an idea of what sort of time frame I am talking about.)

Alternative interfaces, like cell phones, voice, and snazzy graphical results are all nice, but in the end they represent relatively easy technology problems when compared to the challenges involved in realizing our hypothetical search engine. What really matters is what is under the hood.

Today, search engines have almost no understanding of words or language in any significant way. They exploit the statistical properties of words and links, but in no way is there anything going on akin to understanding. Search engines don’t recognize user intent, can’t distinguish goal-oriented search from browsing search, and are completely ignorant of the subtleties of how different concepts relate to one another. Moreover, they completely lack wisdom — i.e., they are very poor at distinguishing between trivia and something profound.

RS: What’s your feeling about trying to place structured data like a library catalog/bibliographic record or an indexed article into an unstructured database? Asked another way, what’s the role of structured data in an unstructured web world? How can we bring both types of resources together and still allow users to take advantage of all of the additional access points that a structured database and its retrieval mechanism make available?

GF: The beautiful thing about a relational database is that its structure tells you a lot about what is important. Database designers have been brilliant at optimizing databases (both the organization of the information as well as the algorithms) to best exploit this regularity. When you flatten out a database, those paths towards optimization often aren’t available.

A middle ground — which is not perfect, but adds a lot of utility — is to convert structured into a semi-structured form. Today, we treat documents as a big bag of words and index those words. In this semi-structured approach, we take structured information (say, the value of specific fields) and synthesize fake words that represent the fact that “document X has field Y with value Z.? Now, clearly I can’t run a SQL query on this representation; but at least I can search for documents with specific field:value pairs.

I’d like to tell you that we will be able to make an unstructured database as powerful as a structured database; but that simply is not the case. Nonetheless, the fusion of structured and unstructured data and approaches will add a lot of utility to the lives of most users.

In parallel to the above, we have started on a different approach through the launch of our Content Acquisition Program, working with such partners as NPR and the Library of Congress, as well as with universities such as Northwestern, UCLA and University of Michigan — all so we can bring their structured data to a larger audience.

RS: What’s going to be the “next big thing” in web search?

GF: I believe that the next big thing in web search will be a form of personalization that is simple, unobtrusive, intuitive, and almost without exception better than the non-personalized version of web search. Two ways of getting this wrong are to (1) keep the GUI as is, implicitly build a user model, and show personalized results all the time, or (2) expose many new GUI elements to the user to give a great deal of explicit control for personalization.

The sweet spot — the thing that works — will most likely be a slight modification to the GUI, say a single new GUI element, that gives the user the power to tell the search engine what they like or dislike. If done correctly, we will all wonder how we ever searched without it, and it will be as if we get the best of both worlds: more control with minimal complication and a search experience that seems tailored to our own needs.

Tags: Search Engines

0 responses so far ↓

  • There are no comments yet...Kick things off by filling out the form below.

Leave a Comment