Speech recognition is one of most exciting and useful applications of machine learning and with growing popularity of virtual assistants and chatbots demand for this technology is only going to increase. In this article we’ll learn how to Convert Speech to Text with Output using Python and gtts library.
Step by Step Video to Convert Speech to Text in Python with Output
Step 1: Installing required libraries
To get started we need install required libraries. We’ll be using gtts library for speech recognition and SpeechRecognition library convert speech to text. To install these libraries we’ll use pip. In new Colab notebook we can run following code:
!pip install gtts
!pip install SpeechRecognition
Step 2: Importing libraries
Now that we’ve installed libraries we need to import them into our Python code. We can do this by adding following line code our notebook:
from gtts import gTTS
import speech_recognition as sr
Step 3: Recording speech
To convert speech to text we first need to record speech. We can do this using sr.Recognizer class from SpeechRecognition library. We can define a function that will record speech using microphone of our device. Here’s the code for function:
def record_audio():
r = sr.Recognizer()
with sr.Microphone() as source:
print('Speak now...')
r.adjust_for_ambient_noise(source)
audio = r.listen(source)
return audio
Step 4: Converting speech to text
Once we’ve recorded speech we can use gTTS class from gtts library convert to text. We can define function that will take in the recorded audio as an argument and return text. Here’s code for function:
def speech_to_text(audio):
r = sr.Recognizer()
try:
text = r.recognize_google(audio)
return text
except:
return 'Sorry, I could not understand what you said.'
Step 5: Putting it all together
Now that we’ve defined our functions we can put them together to convert speech to text. Here’s complete code:
from gtts import gTTS
import speech_recognition as sr
def record_audio():
r = sr.Recognizer()
with sr.Microphone() as source:
print('Speak now...')
r.adjust_for_ambient_noise(source)
audio = r.listen(source)
return audio
def speech_to_text(audio):
r = sr.Recognizer()
try:
text = r.recognize_google(audio)
return text
except:
return 'Sorry, I could not understand what you said.'
audio = record_audio()
text = speech_to_text(audio)
print('You said:', text)
Step 6: Running the code
To run code we can simply click on the “Run” button in Colab or press “Ctrl+Enter” on our keyboard. We’ll be prompted to speak and our speech will be converted to text.
Conclusion In this article we learned how convert speech to text using Python and gtts library. We installed required libraries imported them into our code, recorded the speech using the microphone of our device, and used the gtts library to convert it to text. We put it all together to create a simple application that can recognize speech and convert it to text. With this knowledge, you can now explore more advanced speech recognition applications and build your own voice-activated projects.

Complete code implementation
Here is complete code implementation for converting speech to text using Python and gtts library:
# Import required libraries
from gtts import gTTS
import speech_recognition as sr
# Define function to record audio
def record_audio():
r = sr.Recognizer()
with sr.Microphone() as source:
print('Speak now...')
r.adjust_for_ambient_noise(source)
audio = r.listen(source)
return audio
# Define function to convert speech to text
def speech_to_text(audio):
r = sr.Recognizer()
try:
text = r.recognize_google(audio)
return text
except:
return 'Sorry, I could not understand what you said.'
# Record audio
audio = record_audio()
# Convert speech to text
text = speech_to_text(audio)
# Print the converted text
print('You said:', text)
To run this code simply copy and paste into a new Colab notebook and click “Run” button or press “Ctrl+Enter” on your keyboard. You’ll be prompted to speak and your speech will converted to text.
Here are some useful links related speech-to-text conversion using Python and gtts
library:
gtts
documentation: https://gtts.readthedocs.io/en/latest/pyaudio
documentation: https://people.csail.mit.edu/hubert/pyaudio/docs/- Tutorial on speech recognition using Python: https://realpython.com/python-speech-recognition/
- Tutorial on text-to-speech using Python and
gtts
: https://www.geeksforgeeks.org/convert-text-speech-python-gtts/ - Stack Overflow thread on “No module named ‘gtts'”: https://stackoverflow.com/questions/52243811/module-not-found-error-gtts
- GitHub repository for
pyaudio
: https://github.com/spatialaudio/pyaudio
These resources should provide you with all information you need get started with speech-to-text conversion using Python and gtts
library.