Definition

speech recognition

By

Ben Lutkevich, Site Editor
Karolina Kiwak

What is speech recognition?

Speech recognition, or speech-to-text, is the ability of a machine or program to identify words spoken aloud and convert them into readable text. Rudimentary speech recognition software has a limited vocabulary and may only identify words and phrases when spoken clearly. More sophisticated software can handle natural speech, different accents and various languages.

Speech recognition uses a broad array of research in computer science, linguistics and computer engineering. Many modern devices and text-focused programs have speech recognition functions in them to allow for easier or hands-free use of a device.

Speech recognition and voice recognition are two different technologies and should not be confused:

Speech recognition is used to identify words in spoken language.
Voice recognition is a biometric technology for identifying an individual's voice.

How does speech recognition work?

Speech recognition systems use computer algorithms to process and interpret spoken words and convert them into text. A software program turns the sound a microphone records into written language that computers and humans can understand, following these four steps:

analyze the audio;
break it into parts;
digitize it into a computer-readable format; and
use an algorithm to match it to the most suitable text representation.

Speech recognition software must adapt to the highly variable and context-specific nature of human speech. The software algorithms that process and organize audio into text are trained on different speech patterns, speaking styles, languages, dialects, accents and phrasings. The software also separates spoken audio from background noise that often accompanies the signal.

To meet these requirements, speech recognition systems use two types of models:

Acoustic models. These represent the relationship between linguistic units of speech and audio signals.
Language models. Here, sounds are matched with word sequences to distinguish between words that sound similar.

What applications is speech recognition used for?

Speech recognition systems have quite a few applications. Here is a sampling of them.

Mobile devices. Smartphones use voice commands for call routing, speech-to-text processing, voice dialing and voice search. Users can respond to a text without looking at their devices. On Apple iPhones, speech recognition powers the keyboard and Siri, the virtual assistant. Functionality is available in secondary languages, too. Speech recognition can also be found in word processing applications like Microsoft Word, where users can dictate words to be turned into text.

virtual assistant task list — Virtual assistants use speech recognition to communicate with users and perform a variety of tasks triggered by voice commands.

Education. Speech recognition software is used in language instruction. The software hears the user's speech and offers help with pronunciation.

Customer service. Automated voice assistants listen to customer queries and provides helpful resources.

Healthcare applications. Doctors can use speech recognition software to transcribe notes in real time into healthcare records.

Disability assistance. Speech recognition software can translate spoken words into text using closed captions to enable a person with hearing loss to understand what others are saying. Speech recognition can also enable those with limited use of their hands to work with computers, using voice commands instead of typing.

Court reporting. Software can be used to transcribe courtroom proceedings, precluding the need for human transcribers.

Emotion recognition. This technology can analyze certain vocal characteristics to determine what emotion the speaker is feeling. Paired with sentiment analysis, this can reveal how someone feels about a product or service.

Hands-free communication. Drivers use voice control for hands-free communication, controlling phones, radios and global positioning systems, for instance.

list of AI-driven speech recognition applications — How applications of speech recognition technology compare with other AI-based technologies.

What are the features of speech recognition systems?

Good speech recognition programs let users customize them to their needs. The features that enable this include:

Language weighting. This feature tells the algorithm to give special attention to certain words, such as those spoken frequently or that are unique to the conversation or subject. For example, the software can be trained to listen for specific product references.
Acoustic training. The software tunes out ambient noise that pollutes spoken audio. Software programs with acoustic training can distinguish speaking style, pace and volume amid the din of many people speaking in an office.
Speaker labeling. This capability enables a program to label individual participants and identify their specific contributions to a conversation.
Profanity filtering. Here, the software filters out undesirable words and language.

What are the different speech recognition algorithms?

The power behind speech recognition features comes from a set of algorithms and technologies. They include the following:

Hidden Markov model. HMMs are used in autonomous systems where a state is partially observable or when all of the information necessary to make a decision is not immediately available to the sensor (in speech recognition's case, a microphone). An example of this is in acoustic modeling, where a program must match linguistic units to audio signals using statistical probability.
Natural language processing. NLP eases and accelerates the speech recognition process.
N-grams. This simple approach to language models creates a probability distribution for a sequence. An example would be an algorithm that looks at the last few words spoken, approximates the history of the sample of speech and uses that to determine the probability of the next word or phrase that will be spoken.
Artificial intelligence. AI and machine learning methods like deep learning and neural networks are common in advanced speech recognition software. These systems use grammar, structure, syntax and composition of audio and voice signals to process speech. Machine learning systems gain knowledge with each use, making them well suited for nuances like accents.

What are the advantages of speech recognition?

There are several advantages to using speech recognition software, including the following:

Machine-to-human communication. The technology enables electronic devices to communicate with humans in natural language or conversational speech.
Readily accessible. This software is frequently installed in computers and mobile devices, making it accessible.
Easy to use. Well-designed software is straightforward to operate and often runs in the background.
Continuous, automatic improvement. Speech recognition systems that incorporate AI become more effective and easier to use over time. As systems complete speech recognition tasks, they generate more data about human speech and get better at what they do.

What are the disadvantages of speech recognition?

While convenient, speech recognition technology still has a few issues to work through. Limitations include:

Inconsistent performance. The systems may be unable to capture words accurately because of variations in pronunciation, lack of support for some languages and inability to sort through background noise. Ambient noise can be especially challenging. Acoustic training can help filter it out, but these programs aren't perfect. Sometimes it's impossible to isolate the human voice.
Speed. Some speech recognition programs take time to deploy and master. The speech processing may feel relatively slow.
Source file issues. Speech recognition success depends on the recording equipment used, not just the software.

The takeaway

Speech recognition is an evolving technology. It is one of the many ways people can communicate with computers with little or no typing. A variety of communications-based business applications capitalize on the convenience and speed of spoken communication that this technology enables.

Speech recognition programs have advanced greatly over 60 years of development. They are still improving, fueled in particular by AI.

Learn more about the AI-powered business transcription software in this Q&A with Wilfried Schaffner, chief technology officer of Speech Processing Solutions.

This was last updated in September 2021

Continue Reading About speech recognition

How can speech recognition technology support remote work?

Automatic speech recognition may be better than you think

Speech recognition use cases enable touchless collaboration

Automated speech recognition gives CX vendor an edge

Speech API from Mozilla's Web developer platform

Dig Deeper on Customer service and contact center

Content Management

7 SharePoint problems that spur customers to leave the platform
SharePoint is a well-known content management and collaboration platform. Despite its popularity, it can introduce many ...
5 benefits of enterprise search
With a proper enterprise search strategy in place, organizations can improve their employees' efficiency and ensure customers ...
OpenText expands GenAI for enterprise content, IoT
OpenText finds a novel use for generative AI: combing through, sorting and summarizing massive amounts of IoT data. It also ...

Unified Communications

Enhanced 911 transitions to Next Generation 911
Next Generation 911 has improved calling features compared to Enhanced 911, especially when locating callers. But businesses face...
Everything Enterprise Connect 2024: News, trends and insights
Check out our news and analysis from Enterprise Connect, one of the largest enterprise communications and collaboration ...
AI, toll fraud and messaging top the list of UC security concerns
AI might get all the attention, but IT leaders are also concerned about some other key security vulnerabilities within their ...

Data Management

AI boosts efficiency in data management
AI can automate tasks across every aspect of the data management process, enabling data teams to focus on models, not labeling ...
AtScale adds semantic layer support for AI, GenAI models
The vendor's new platform update centers around decision-making flexibility, collaboration and community, and includes a metadata...
Open source vs. proprietary database management
Open source and commercial databases are alternative options to help streamline data management processes. Examine the pros and ...

Microsoft's new Phi-3-mini AI language model runs on iPhone
Microsoft researchers contend the Phi-3-mini's performance is on par with the much larger ChatGPT 3.5 model and can run on an ...
How to measure the ROI of enterprise AI initiatives
Interest in AI tools and systems has skyrocketed across industries. To ensure their endeavors are worthwhile, businesses are ...
AWS boosts Amazon Bedrock GenAI platform, upgrades Titan LLM
The cloud giant buttressed its GenAI platform with features to import, select and build safety guardrails for third-party LLMs ...

ERP

Microsoft, SAP add more AI to manufacturing, supply chain
At the Hannover Messe industrial show, Microsoft debuted Fabric AI for OT and IT data, and the copilot template for factory ...
How to create a simple supply chain map
A simple supply chain map can give insight into various areas, such as critical business challenges. Learn why manufacturing ...
Certinia adds AI capabilities to PSA cloud suite
The PSA vendor adds AI functionality to its professional services cloud applications that are designed to help services firms ...

Close