You asked for it - and I delivered | Live speech transcription with OpenAI Whisper STT

3 min read 1 year ago
Published on Aug 04, 2024 This response is partially generated with the help of AI. It may contain inaccuracies.

Table of Contents

Introduction

This tutorial will guide you through the process of running live speech transcription using OpenAI's Whisper on a Raspberry Pi 5. We will cover setting up the environment, using different microphones, and integrating Whisper with a portable voice changer device. This is ideal for anyone looking to implement real-time transcription with minimal latency and enhanced performance.

Step 1: Set Up Your Raspberry Pi

  1. Install Raspberry Pi OS

    • Start with a fresh Raspberry Pi OS Lite image (64-bit).
    • Flash the image onto your Raspberry Pi and boot it up.
  2. Update System Packages

    • Open a terminal and run the following commands:
      sudo apt-get update
      sudo apt-get upgrade
      
  3. Install Required Packages

    • Install Git to clone repositories:
      sudo apt-get install git
      

Step 2: Create a Python Virtual Environment

  1. Create the Virtual Environment

    • Run:
      python3 -m venv venv
      
  2. Activate the Virtual Environment

    • Use the command:
      source venv/bin/activate
      

Step 3: Install Required Libraries

  1. Install Whisper Live and Other Dependencies
    • Inside the activated virtual environment, install Whisper Live:
      git clone https://github.com/AIWintermuteAI/WhisperLive.git
      cd WhisperLive
      pip install -r requirements.txt
      
    • If needed, also install Piper TTS:
      git clone https://github.com/rhasspy/piper.git
      

Step 4: Configure Audio Input

  1. Select Your Microphone

    • You can use either the ReSpeaker 2-Mics Pi HAT or the ReSpeaker USB Mic Array.
    • Ensure your microphone is connected and recognized by the Raspberry Pi.
  2. Test Audio Capture

    • Verify that audio recording works:
      arecord -l
      
    • This command will list the available capture devices.

Step 5: Implement Faster Whisper

  1. Adjust Audio Context Settings

    • To improve transcription speed, set the audio context parameter:
      • Formula: (window size / 30) * 1500 + 128
    • Apply this setting in your Whisper configuration.
  2. Run the Whisper Live Server

    • In the Whisper Live directory, start the server:
      python server.py
      

Step 6: Create a Client for Transcription

  1. Write a Simple Client Script

    • Create a Python script to handle the transcription:
    import websocket
    
    def on_message(ws, message):
        print("Transcription:", message)
    
    ws = websocket.WebSocketApp("ws://localhost:5000",
                                on_message=on_message)
    ws.run_forever()
    
    • Save this script as client.py.
  2. Run the Client

    • Execute your client script in another terminal:
      python client.py
      

Step 7: Test Your Setup

  1. Speak into the Microphone

    • Try saying a few sentences and observe the real-time transcription.
    • Test with different sentences to ensure accuracy and performance.
  2. Evaluate Different Microphones

    • Test both the ReSpeaker 2-Mics Pi HAT and the ReSpeaker USB Mic Array for performance differences.

Conclusion

You have successfully set up a live speech transcription system using OpenAI's Whisper on a Raspberry Pi 5. You can further enhance this setup by integrating it with TTS systems or modifying the client for specific applications. For next steps, consider exploring different models provided by Whisper and fine-tuning your audio input settings for optimal performance.