Hardware.ai Watch on YouTube

You asked for it - and I delivered | Live speech transcription with OpenAI Whisper STT

3 min read 1 year ago

Published on Aug 04, 2024 This response is partially generated with the help of AI. It may contain inaccuracies.

Table of Contents

Introduction

This tutorial will guide you through the process of running live speech transcription using OpenAI's Whisper on a Raspberry Pi 5. We will cover setting up the environment, using different microphones, and integrating Whisper with a portable voice changer device. This is ideal for anyone looking to implement real-time transcription with minimal latency and enhanced performance.

Step 1: Set Up Your Raspberry Pi

Install Raspberry Pi OS
- Start with a fresh Raspberry Pi OS Lite image (64-bit).
- Flash the image onto your Raspberry Pi and boot it up.
Update System Packages
- Open a terminal and run the following commands:
```
sudo apt-get update
sudo apt-get upgrade
```
Install Required Packages
- Install Git to clone repositories:
```
sudo apt-get install git
```

Step 2: Create a Python Virtual Environment

Create the Virtual Environment
- Run:
```
python3 -m venv venv
```
Activate the Virtual Environment
- Use the command:
```
source venv/bin/activate
```

Step 3: Install Required Libraries

Install Whisper Live and Other Dependencies

Inside the activated virtual environment, install Whisper Live:

git clone https://github.com/AIWintermuteAI/WhisperLive.git
cd WhisperLive
pip install -r requirements.txt

If needed, also install Piper TTS:

git clone https://github.com/rhasspy/piper.git

Step 4: Configure Audio Input

Select Your Microphone
- You can use either the ReSpeaker 2-Mics Pi HAT or the ReSpeaker USB Mic Array.
- Ensure your microphone is connected and recognized by the Raspberry Pi.
Test Audio Capture
- Verify that audio recording works:
```
arecord -l
```
- This command will list the available capture devices.

Step 5: Implement Faster Whisper

Adjust Audio Context Settings
- To improve transcription speed, set the audio context parameter:
  - Formula: (window size / 30) * 1500 + 128
- Apply this setting in your Whisper configuration.
Run the Whisper Live Server
- In the Whisper Live directory, start the server:
```
python server.py
```

Step 6: Create a Client for Transcription

Write a Simple Client Script

Create a Python script to handle the transcription:

import websocket

def on_message(ws, message):
    print("Transcription:", message)

ws = websocket.WebSocketApp("ws://localhost:5000",
                            on_message=on_message)
ws.run_forever()

Save this script as client.py.

Run the Client
- Execute your client script in another terminal:
```
python client.py
```

Step 7: Test Your Setup

Speak into the Microphone
- Try saying a few sentences and observe the real-time transcription.
- Test with different sentences to ensure accuracy and performance.
Evaluate Different Microphones
- Test both the ReSpeaker 2-Mics Pi HAT and the ReSpeaker USB Mic Array for performance differences.

Conclusion

You have successfully set up a live speech transcription system using OpenAI's Whisper on a Raspberry Pi 5. You can further enhance this setup by integrating it with TTS systems or modifying the client for specific applications. For next steps, consider exploring different models provided by Whisper and fine-tuning your audio input settings for optimal performance.

Table of Contents

Recent