Local Whisper Amd
- published
- reading time
- 1 minute
Local Text to Speech
Whisper is a text to speech recognition system developed by OpenAI. It uses a transformer architecture to generate text from audio files.
Install dependencies
sudo dnf install -y python3-whisper.noarch python3-ipywidgets.noarch \
  python3-torch-rocm-gfx9 python3-torch python3-torchvision
Unfortunately Fedora does not have all the dependencies we need so we have to create a
requirements.txt file.
cat <<EOF > requirements.txt
datasets
transformers
numba
pysoundfile
EOF
Translate test data
#!/usr/bin/python
import torch
from transformers import pipeline
TEST_DIR="test_data"
TEST_FILE=f"{TEST_DIR}/preamble.wav"
device = "cuda:0" if torch.cuda.is_available() else "cpu"
pipe = pipeline(
  "automatic-speech-recognition",
  model="openai/whisper-medium.en",
  chunk_length_s=30,
  device=device,
)
transcription = pipe(TEST_FILE)['text']
print(transcription)