Speech Recognition

Writes NYT:

Speech recognition software…is still fairly primitive. At most it can identify individual words, but not periods, commas, sentences or paragraphs, much less when a speaker is joking.

Give such a program a snippet of the evening news, for example, and it will produce a raw stream of words: “an earthquake hit last night at 11 pm we bring you live coverage on wall street today the market slumped.”

Human beings are a lot better than machines at transcribing speech like this. They can figure out how to punctuate the text and they can resolve whether a phrase like “for sure” is a statement, a question or a jeer, guided by the speaker’s intonation.

Now researchers in the United States and abroad are working to build those same subtle cues, known collectively as prosody, into speech recognition software. The hope is to create automatic ways to detect the slight differences in pitch, timing and amplitude that are so easy for people to interpret and so hard for computers.

Published by

Rajesh Jain

An Entrepreneur based in Mumbai, India.