Local Whisper Amd
- published
- reading time
- 1 minute
Local Text to Speech
Whisper is a text to speech recognition system developed by OpenAI. It uses a transformer architecture to generate text from audio files.
Install dependencies
sudo dnf install -y python3-whisper.noarch python3-ipywidgets.noarch \
python3-torch-rocm-gfx9 python3-torch python3-torchvision
Unfortunately Fedora does not have all the dependencies we need so we have to create a
requirements.txt
file.
cat <<EOF > requirements.txt
datasets
transformers
numba
pysoundfile
EOF
Translate test data
#!/usr/bin/python
import torch
from transformers import pipeline
TEST_DIR="test_data"
TEST_FILE=f"{TEST_DIR}/preamble.wav"
device = "cuda:0" if torch.cuda.is_available() else "cpu"
pipe = pipeline(
"automatic-speech-recognition",
model="openai/whisper-medium.en",
chunk_length_s=30,
device=device,
)
transcription = pipe(TEST_FILE)['text']
print(transcription)