How to Convert Speech to Text for FREE Using Whisper AI & Google Colab (Step-by-Step Tutorial)

3 min read 7 hours ago
Published on Dec 23, 2024 This response is partially generated with the help of AI. It may contain inaccuracies.

Table of Contents

Introduction

This tutorial will guide you through the process of converting speech to text for free using OpenAI's Whisper AI and Google Colab. You will learn how to transcribe audio and video files accurately without needing to download any software locally. This step-by-step guide is perfect for anyone looking to transcribe podcasts, interviews, or YouTube videos without prior coding experience.

Step 1: Setting Up Google Colab

  • Open your web browser and navigate to Google Colab.
  • Sign in with your Google account if prompted.
  • Start a new notebook by clicking on "File" and then "New Notebook."

Step 2: Choosing Runtime Options

  • In the Colab menu, go to "Runtime."
  • Select "Change runtime type."
  • Choose either CPU or GPU:
    • CPU is sufficient for smaller files.
    • GPU is recommended for larger files for faster processing.

Step 3: Installing Required Packages

  • In a new code cell, enter the following commands to install Whisper AI and FFmpeg:
!pip install git+https://github.com/openai/whisper.git 
!apt install ffmpeg
  • Run the cell by clicking the play button.

Step 4: Uploading Audio or Video Files

  • Use the following code to upload your files:
from google.colab import files
uploaded = files.upload()
  • Click on the "Choose Files" button that appears and select the audio or video files you want to transcribe.

Step 5: Selecting the Whisper Model

  • Choose the appropriate Whisper model based on your needs:

    • Base: Good for quick transcriptions.
    • Large: Best for high accuracy but requires more resources.
  • You can load a model with the following code:

import whisper

model = whisper.load_model("base")  # change "base" to "large" if needed

Step 6: Running the Transcription

  • Use the following code to transcribe your uploaded files:
result = model.transcribe("your_audio_file.mp3")  # replace with your file name
print(result["text"])

Step 7: Understanding File Outputs

  • The output will display the transcribed text in the console.
  • You can save the transcript to a text file with this code:
with open("transcription.txt", "w") as f:
    f.write(result["text"])

Step 8: Avoiding File Loss

  • Google Colab sessions can reset, leading to loss of files and transcriptions. To avoid this:
    • Download your transcription file by using the following code:
files.download("transcription.txt")
  • Regularly save your work and download important files during the session.

Conclusion

You’ve successfully learned how to convert speech to text using Whisper AI on Google Colab. Remember to choose the right model for your needs and save your files frequently to avoid losing your work. Now you can transcribe your audio and video files with ease, making it a useful tool for various applications like content creation and research. Happy transcribing!