tts.piper#
Description#
Text-to-speech plugin that uses Piper, a fast and local neural text-to-speech engine.
Install with:
$ pip install piper-tts
You will also need to download at least one voice model. You can do so via
the download_voice() action.
Voice models are typically stored in ~/.local/share/piper_tts.
The full list of supported voice models is here.
- param kwargs:
Extra arguments to be passed to the TtsPlugin constructor.
Configuration#
tts.piper:
# [Optional]
# Path to the Piper ``.onnx`` default voice model file, or
# model name (e.g. ``en_US-hfc_female-medium``) relative to
# ``models_dir``.
# If not specified, it must be specified when `TtsPiperPlugin.say <https://docs.platypush.tech/platypush/plugins/tts.piper.html#platypush.plugins.tts.piper.TtsPiperPlugin.say>`_ is called,
# or a model should be downloaded via `TtsPiperPlugin.download_voice <https://docs.platypush.tech/platypush/plugins/tts.piper.html#platypush.plugins.tts.piper.TtsPiperPlugin.download_voice>`_.
# model: # type=str | None
# [Optional]
# Directory where Piper voice models are stored.
# Default: ``<WORKDIR>/piper_tts``.
# models_dir: # type=str | None
# [Optional]
# Default speaker ID for multi-speaker models
# (default: None).
# speaker_id: # type=int | None
# [Optional]
# Default speaking speed scale. Higher values make
# speech slower (default: voice default, typically 1.0).
# length_scale: # type=float | None
# [Optional]
# Default audio variation / expressiveness scale
# (default: voice default).
# noise_scale: # type=float | None
# [Optional]
# Default phoneme width variation scale
# (default: voice default).
# noise_w_scale: # type=float | None
# [Optional]
# Whether to use CUDA for GPU acceleration. Requires
# ``onnxruntime-gpu`` to be installed (default: False).
# use_cuda: False # type=bool
# [Optional]
# Silence, in seconds, to prepend before playing
# the audio. This gives the audio backend (e.g. PulseAudio/PipeWire)
# time to initialize the output path, avoiding the first fraction of
# generated speech being silently dropped (default: 1).
# start_padding: 1 # type=float
# [Optional]
# Silence, in seconds, to append before closing the
# playback stream. This avoids clipping the tail of short generated
# speech on some audio backends (default: 1).
# end_padding: 1 # type=float
# [Optional]
# Language code (default: ``en-US``).
# language: en-US
Dependencies#
pip
pip install sounddevice numpy piper-tts
Alpine
apk add ffmpeg py3-numpy portaudio-dev
Debian
apt install ffmpeg portaudio19-dev python3-numpy
Fedora
yum install ffmpeg portaudio-devel python-numpy
Arch Linux
pacman -S ffmpeg python-sounddevice portaudio python-numpy
Actions#
Module reference#
- class platypush.plugins.tts.piper.TtsPiperPlugin(*, model: str | None = None, models_dir: str | None = None, speaker_id: int | None = None, length_scale: float | None = None, noise_scale: float | None = None, noise_w_scale: float | None = None, use_cuda: bool = False, start_padding: float = 1, end_padding: float = 1, **kwargs)[source]#
Bases:
TtsPluginText-to-speech plugin that uses Piper, a fast and local neural text-to-speech engine.
Install with:
$ pip install piper-tts
You will also need to download at least one voice model. You can do so via the
download_voice()action.Voice models are typically stored in
~/.local/share/piper_tts.The full list of supported voice models is here.
- __init__(*, model: str | None = None, models_dir: str | None = None, speaker_id: int | None = None, length_scale: float | None = None, noise_scale: float | None = None, noise_w_scale: float | None = None, use_cuda: bool = False, start_padding: float = 1, end_padding: float = 1, **kwargs)[source]#
- Parameters:
model – Path to the Piper
.onnxdefault voice model file, or model name (e.g.en_US-hfc_female-medium) relative tomodels_dir. If not specified, it must be specified whensay()is called, or a model should be downloaded viadownload_voice().models_dir – Directory where Piper voice models are stored. Default:
<WORKDIR>/piper_tts.speaker_id – Default speaker ID for multi-speaker models (default: None).
length_scale – Default speaking speed scale. Higher values make speech slower (default: voice default, typically 1.0).
noise_scale – Default audio variation / expressiveness scale (default: voice default).
noise_w_scale – Default phoneme width variation scale (default: voice default).
use_cuda – Whether to use CUDA for GPU acceleration. Requires
onnxruntime-gputo be installed (default: False).start_padding – Silence, in seconds, to prepend before playing the audio. This gives the audio backend (e.g. PulseAudio/PipeWire) time to initialize the output path, avoiding the first fraction of generated speech being silently dropped (default: 1).
end_padding – Silence, in seconds, to append before closing the playback stream. This avoids clipping the tail of short generated speech on some audio backends (default: 1).
kwargs – Extra arguments to be passed to the
platypush.plugins.tts.TtsPluginconstructor.
- download_voice(voice: str, models_dir: str | None = None)[source]#
Download a Piper voice model.
- Parameters:
voice – Name of the voice to download (e.g.
en_US-lessac-medium).models_dir – Directory to store the downloaded voice model (default: the configured
models_dir).
- say(text: str, *_, model: str | None = None, speaker_id: int | None = None, length_scale: float | None = None, noise_scale: float | None = None, noise_w_scale: float | None = None, output_file: str | None = None, **player_args)[source]#
Say some text.
- Parameters:
text – Text to say.
model – Override the default voice model.
speaker_id – Speaker ID override for multi-speaker models.
length_scale – Speaking speed override.
noise_scale – Audio variation override.
noise_w_scale – Phoneme width variation override.
output_file – If set, save the audio to the specified file instead of playing it.
player_args – Extends the additional arguments to be passed to
platypush.plugins.sound.SoundPlugin.play()(like volume, duration, channels etc.).
- stop()#
Stop the playback.