How to Install OpenAI Whisper (Win, Mac, Linux, Ubuntu) : Chris

January 27, 2024 January 27, 2024

How to Install OpenAI Whisper (Win, Mac, Linux, Ubuntu)
by: Chris
blow post content copied from Be on the Right Side of Change
click here to view original post

5/5 - (1 vote)

Run pip3 install openai-whisper in your command line. Once installed, use Whisper to transcribe audio files.

pip install openai-whisper

Alternatively, you may use any of the following commands to install openai, depending on your concrete environment (Linux, Ubuntu, Windows, macOS). One is likely to work!

 If you have only one version of Python installed:
pip install openai-whisper

 If you have Python 3 (and, possibly, other versions) installed:
pip3 install openai-whisper

 If you don't have PIP or it doesn't work
python -m pip install openai-whisper
python3 -m pip install openai-whisper

 If you have Linux and you need to fix permissions (any one):
sudo pip3 install openai-whisper
pip3 install openai-whisper --user

 If you have Linux with apt
sudo apt install openai-whisper

 If you have Windows and you have set up the py alias
py -m pip install openai-whisper

 If you have Anaconda
conda install -c anaconda openai-whisper

 If you have Jupyter Notebook
!pip install openai-whisper
!pip3 install openai-whisper

With Upgrade Installation Routine

Upgrade pip and install the openai library using the following two commands, one after the other:

python3 -m pip install --upgrade pip
python3 -m pip install --upgrade openai-whisper

Here’s the code for copy&pasting:

python3 -m pip install --upgrade pip
python3 -m pip install --upgrade openai-whisper

Detailed Instructions

The codebase is compatible with Python versions 3.8 to 3.11 and recent PyTorch releases. Key dependencies include OpenAI’s ‘tiktoken‘ for fast tokenization. To install or update to the latest release of Whisper, use:

pip install -U openai-whisper

For the latest repository version and dependencies, use:

pip install git+https://github.com/openai/whisper.git

To update to the repository’s latest version without dependencies:

pip install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git

FFmpeg, a command-line tool, is also required and can be installed via various package managers:

For Ubuntu or Debian: sudo apt update && sudo apt install ffmpeg
For Arch Linux: sudo pacman -S ffmpeg
For MacOS with Homebrew: brew install ffmpeg
For Windows with Chocolatey: choco install ffmpeg
For Windows with Scoop: scoop install ffmpeg

If ‘tiktoken‘ lacks a pre-built wheel for your platform, installing Rust may be necessary. In case of installation errors, follow the Rust development environment setup and adjust the PATH environment variable as needed. If encountering 'No module named setuptools_rust', install it via pip install setuptools-rust.

Whisper Models

Whisper offers five model sizes, from ‘tiny’ to ‘large’, with English-only versions available for four sizes. These models vary in memory requirements, speed, and accuracy. English-only models (‘.en’) generally perform better, especially the ‘tiny.en’ and ‘base.en’ versions.

Performance varies by language with WER (word error rate) and CER (character error rate) metrics:

How to Transcribe Audio with Whisper?

For command-line usage, Whisper can transcribe audio files using different models:

whisper audio.flac audio.mp3 audio.wav --model medium

The default setting is suitable for English. Non-English speech transcription and translation into English are also supported:

whisper japanese.wav --language Japanese --task translate

Use whisper --help to view all options. Available languages are listed in tokenizer.py:

Python Usage (Transcription) with Whisper

In Python, transcription can be performed with:

import whisper

model = whisper.load_model("base")
result = model.transcribe("audio.mp3")
print(result["text"])

This process involves a 30-second sliding window for sequence-to-sequence predictions. The whisper.detect_language() and whisper.decode() functions offer lower-level access:

import whisper

model = whisper.load_model("base")
audio = whisper.pad_or_trim(whisper.load_audio("audio.mp3"))
mel = whisper.log_mel_spectrogram(audio).to(model.device)
_, probs = model.detect_language(mel)
print(f"Detected language: {max(probs, key=probs.get)}")

options = whisper.DecodingOptions()
result = whisper.decode(model, mel, options)
print(result.text)

If you want to master Whisper, check out our full prompt engineering mastery course teaching you the ins and outs of speech recognition in Python on the Finxter Academy:

Full Course: OpenAI Whisper – Building Cutting-Edge Python Apps with OpenAI Whisper

Check out our full OpenAI Whisper course with video lessons, easy explanations, GitHub, and a downloadable PDF certificate to prove your speech processing skills to your employer and freelancing clients:

[Academy] Voice-First Development: Building Cutting-Edge Python Apps Powered By OpenAI Whisper

January 28, 2024 at 01:37AM
Click here for more details...

=============================
The original post is available in Be on the Right Side of Change by Chris
this post has been published as it is through automation. Automation script brings all the top bloggers post under a single umbrella.
The purpose of this blog, Follow the top Salesforce bloggers and collect all blogs in a single place through automation.
============================

Chris Python

Python Reader