Washington Post has a story on Eli Abir and his company, Meaningful Machines, and the progress being made in enabling computers to decipher the meaining of sentences.
Abir’s challenge — and that of computer science — is how to help machines “understand” context in human language, to get around the ambiguity created when words mean different things depending on usage. “Bar” means something different when we say “the corner bar” than when we say “she raised the bar” or “he passed the bar.”
There have been several approaches to helping computers grasp those distinctions. One is a “grammatical” method that tries to tag every word and apply language rules. Another is a statistical system that makes word-to-word comparisons in previously translated text and then consults the matches later to calculate probable meanings when it encounters each word again in untranslated text.
Abir’s approach involves a variation of the second method. His company spent last year encoding his ideas into software algorithms that perform novel forms of pattern analysis that rely on phrases — rather than words — as the core unit of meaning.
Abir’s system analyzes huge amounts of previously translated text — such as United Nations documents — and breaks matching sentences from different languages into paired fragments or phrases, storing them in a database. It also collects information about words that frequently turn up on either side of those phrases, in “overlapping” sentence fragments.
For example, Abir’s algorithm identifies and stores recurring associations or fragments such as “baseball player,” “baseball game” and “autographed baseball.” It then looks at words on either side of those fragments — “threw” or “won” — and stores those language pairs in the database, too. That creates a kind of jigsaw puzzle, which it uses to disassemble sentences in one language and reassemble them in another.