You asked for it - and I delivered | Live speech transcription with OpenAI Whisper STT
Table of Contents
Introduction
This tutorial will guide you through the process of running live speech transcription using OpenAI's Whisper on a Raspberry Pi 5. We will cover setting up the environment, using different microphones, and integrating Whisper with a portable voice changer device. This is ideal for anyone looking to implement real-time transcription with minimal latency and enhanced performance.
Step 1: Set Up Your Raspberry Pi
-
Install Raspberry Pi OS
- Start with a fresh Raspberry Pi OS Lite image (64-bit).
- Flash the image onto your Raspberry Pi and boot it up.
-
Update System Packages
- Open a terminal and run the following commands:
sudo apt-get update sudo apt-get upgrade
- Open a terminal and run the following commands:
-
Install Required Packages
- Install Git to clone repositories:
sudo apt-get install git
- Install Git to clone repositories:
Step 2: Create a Python Virtual Environment
-
Create the Virtual Environment
- Run:
python3 -m venv venv
- Run:
-
Activate the Virtual Environment
- Use the command:
source venv/bin/activate
- Use the command:
Step 3: Install Required Libraries
- Install Whisper Live and Other Dependencies
- Inside the activated virtual environment, install Whisper Live:
git clone https://github.com/AIWintermuteAI/WhisperLive.git cd WhisperLive pip install -r requirements.txt
- If needed, also install Piper TTS:
git clone https://github.com/rhasspy/piper.git
- Inside the activated virtual environment, install Whisper Live:
Step 4: Configure Audio Input
-
Select Your Microphone
- You can use either the ReSpeaker 2-Mics Pi HAT or the ReSpeaker USB Mic Array.
- Ensure your microphone is connected and recognized by the Raspberry Pi.
-
Test Audio Capture
- Verify that audio recording works:
arecord -l
- This command will list the available capture devices.
- Verify that audio recording works:
Step 5: Implement Faster Whisper
-
Adjust Audio Context Settings
- To improve transcription speed, set the audio context parameter:
- Formula:
(window size / 30) * 1500 + 128
- Formula:
- Apply this setting in your Whisper configuration.
- To improve transcription speed, set the audio context parameter:
-
Run the Whisper Live Server
- In the Whisper Live directory, start the server:
python server.py
- In the Whisper Live directory, start the server:
Step 6: Create a Client for Transcription
-
Write a Simple Client Script
- Create a Python script to handle the transcription:
import websocket def on_message(ws, message): print("Transcription:", message) ws = websocket.WebSocketApp("ws://localhost:5000", on_message=on_message) ws.run_forever()
- Save this script as
client.py
.
-
Run the Client
- Execute your client script in another terminal:
python client.py
- Execute your client script in another terminal:
Step 7: Test Your Setup
-
Speak into the Microphone
- Try saying a few sentences and observe the real-time transcription.
- Test with different sentences to ensure accuracy and performance.
-
Evaluate Different Microphones
- Test both the ReSpeaker 2-Mics Pi HAT and the ReSpeaker USB Mic Array for performance differences.
Conclusion
You have successfully set up a live speech transcription system using OpenAI's Whisper on a Raspberry Pi 5. You can further enhance this setup by integrating it with TTS systems or modifying the client for specific applications. For next steps, consider exploring different models provided by Whisper and fine-tuning your audio input settings for optimal performance.