top of page

Microsoft’s Speech Recognition Tech Is Officially as Accurate as Humans


“HUMAN PARITY” ACHIEVED

A study published last Monday, heralded as an historic achievement by Microsoft, details a new speech recognition technology that’s able to transcribe conversational speech as well as humans — or at least, as best as professional human transcriptionists (which is better than mosthumans).


The technology scored a word error rate (WER) of 5.9%, which was lower than the 6.3% WER reported just last month. “[I]t’s the lowest ever recorded against the industry standard Switchboard speech recognition task,” Microsoft reports. The rate is the same as (or even lower than) the human professional transcriptionists who transcribed the same conversation.


“We’ve reached human parity,” says Xuedong Huang, Microsoft’s chief speech scientist. The new technology uses neural language models that allow for more efficient generalization by grouping similar words together.


The achievement comes decades after speech pattern recognition was first studied in the 1970s. With Google’s DeepMind making waves in speech and image recognition (and speaking like humans do), the technology is Microsoft’s timely contribution to the fast-paced artificial intelligence (AI) research and development.


The achievement was unlocked using the Computational Network Toolkit, Microsoft’s homegrown system for deep learning.


NEXT STEP: UNDERSTANDING

The applications for the new technology are bound to improve user experience for Microsoft’s personal voice assistant for Windows and Xbox One. “This will make Cortana more powerful, making a truly intelligent assistant possible,” says an excited Harry Shum, the executive vice president heading the Microsoft Artificial Intelligence and Research group. Of course, it will also develop better speech-to-text transcription software.


RECENT POST
bottom of page