`tts.piper`#

Description#

Text-to-speech plugin that uses Piper, a fast and local neural text-to-speech engine.

Install with:

$ pip install piper-tts

You will also need to download at least one voice model. You can do so via the download_voice() action.

Voice models are typically stored in ~/.local/share/piper_tts.

The full list of supported voice models is here.

param kwargs:: Extra arguments to be passed to the TtsPlugin constructor, including output_device and output_volume. output_device accepts a PortAudio/sounddevice device index, PortAudio/sounddevice device name, or PulseAudio/PipeWire sink name (requires pactl). output_volume is a playback volume percentage.

Configuration#

tts.piper:
  # [Optional]
  # Path to the Piper ``.onnx`` default voice model file, or
  # model name (e.g. ``en_US-hfc_female-medium``) relative to
  # ``models_dir``.
  # If not specified, it must be specified when `TtsPiperPlugin.say <https://docs.platypush.tech/platypush/plugins/tts.piper.html#platypush.plugins.tts.piper.TtsPiperPlugin.say>`_ is called,
  # or a model should be downloaded via `TtsPiperPlugin.download_voice <https://docs.platypush.tech/platypush/plugins/tts.piper.html#platypush.plugins.tts.piper.TtsPiperPlugin.download_voice>`_.
  # model:   # type=str | None

  # [Optional]
  # Directory where Piper voice models are stored.
  # Default: ``<WORKDIR>/piper_tts``.
  # models_dir:   # type=str | None

  # [Optional]
  # Default speaker ID for multi-speaker models
  # (default: None).
  # speaker_id:   # type=int | None

  # [Optional]
  # Default speaking speed scale. Higher values make
  # speech slower (default: voice default, typically 1.0).
  # length_scale:   # type=float | None

  # [Optional]
  # Default audio variation / expressiveness scale
  # (default: voice default).
  # noise_scale:   # type=float | None

  # [Optional]
  # Default phoneme width variation scale
  # (default: voice default).
  # noise_w_scale:   # type=float | None

  # [Optional]
  # Whether to use CUDA for GPU acceleration. Requires
  # ``onnxruntime-gpu`` to be installed (default: False).
  # use_cuda: False  # type=bool

  # [Optional]
  # Silence, in seconds, to prepend before playing
  # the audio. This gives the audio backend (e.g. PulseAudio/PipeWire)
  # time to initialize the output path, avoiding the first fraction of
  # generated speech being silently dropped (default: 1).
  # start_padding: 1  # type=float

  # [Optional]
  # Silence, in seconds, to append before closing the
  # playback stream. This avoids clipping the tail of short generated
  # speech on some audio backends (default: 1).
  # end_padding: 1  # type=float

  # [Optional]
  # Language code (default: ``en-US``).
  # language: en-US

  # [Optional]
  # Audio output device to use for playback.
  # Supported formats: PortAudio/sounddevice device index,
  # PortAudio/sounddevice device name, or PulseAudio/PipeWire sink name
  # (e.g. ``alsa_output.pci-...``; requires ``pactl``). If specified,
  # it is passed as the ``device`` argument to
  # `SoundPlugin.play <https://docs.platypush.tech/platypush/plugins/sound.html#platypush.plugins.sound.SoundPlugin.play>`_.
  # output_device:   # type=int | str | None

  # [Optional]
  # Playback volume, as a percentage. ``100`` means
  # unchanged, values below ``100`` attenuate, and values above ``100``
  # amplify with clipping in the playback path. It is passed as the
  # ``volume`` argument to `SoundPlugin.play <https://docs.platypush.tech/platypush/plugins/sound.html#platypush.plugins.sound.SoundPlugin.play>`_.
  # output_volume:   # type=float | None

Dependencies#

pip

pip install numpy sounddevice piper-tts

Alpine

apk add ffmpeg py3-numpy portaudio-dev

Debian

apt install portaudio19-dev ffmpeg python3-numpy

Fedora

yum install ffmpeg portaudio-devel python-numpy

Arch Linux

pacman -S ffmpeg python-numpy portaudio python-sounddevice

Actions#

tts.piper.download_voice

tts.piper.say

tts.piper.stop

Module reference#

class platypush.plugins.tts.piper.TtsPiperPlugin(*, model: str | None = None, models_dir: str | None = None, speaker_id: int | None = None, length_scale: float | None = None, noise_scale: float | None = None, noise_w_scale: float | None = None, use_cuda: bool = False, start_padding: float = 1, end_padding: float = 1, **kwargs)[source]#

Bases: TtsPlugin

Text-to-speech plugin that uses Piper, a fast and local neural text-to-speech engine.

Install with:

$ pip install piper-tts

You will also need to download at least one voice model. You can do so via the download_voice() action.

Voice models are typically stored in ~/.local/share/piper_tts.

The full list of supported voice models is here.

__init__(*, model: str | None = None, models_dir: str | None = None, speaker_id: int | None = None, length_scale: float | None = None, noise_scale: float | None = None, noise_w_scale: float | None = None, use_cuda: bool = False, start_padding: float = 1, end_padding: float = 1, **kwargs)[source]#

Parameters:

model – Path to the Piper .onnx default voice model file, or model name (e.g. en_US-hfc_female-medium) relative to models_dir. If not specified, it must be specified when say() is called, or a model should be downloaded via download_voice().
models_dir – Directory where Piper voice models are stored. Default: <WORKDIR>/piper_tts.
speaker_id – Default speaker ID for multi-speaker models (default: None).
length_scale – Default speaking speed scale. Higher values make speech slower (default: voice default, typically 1.0).
noise_scale – Default audio variation / expressiveness scale (default: voice default).
noise_w_scale – Default phoneme width variation scale (default: voice default).
use_cuda – Whether to use CUDA for GPU acceleration. Requires onnxruntime-gpu to be installed (default: False).
start_padding – Silence, in seconds, to prepend before playing the audio. This gives the audio backend (e.g. PulseAudio/PipeWire) time to initialize the output path, avoiding the first fraction of generated speech being silently dropped (default: 1).
end_padding – Silence, in seconds, to append before closing the playback stream. This avoids clipping the tail of short generated speech on some audio backends (default: 1).
kwargs – Extra arguments to be passed to the platypush.plugins.tts.TtsPlugin constructor, including output_device and output_volume. output_device accepts a PortAudio/sounddevice device index, PortAudio/sounddevice device name, or PulseAudio/PipeWire sink name (requires pactl). output_volume is a playback volume percentage.

download_voice(voice: str, models_dir: str | None = None)[source]#

Download a Piper voice model.

Parameters:

voice – Name of the voice to download (e.g. en_US-lessac-medium).
models_dir – Directory to store the downloaded voice model (default: the configured models_dir).

Say some text.

Parameters:

text – Text to say.
model – Override the default voice model.
speaker_id – Speaker ID override for multi-speaker models.
length_scale – Speaking speed override.
noise_scale – Audio variation override.
noise_w_scale – Phoneme width variation override.
output_file – If set, save the audio to the specified file instead of playing it.
player_args – Extends the additional arguments to be passed to platypush.plugins.sound.SoundPlugin.play() (like volume, duration, channels etc.).

stop()#: Stop the playback.

tts.piper

Platypush documentation

tts.piper#

Description#

Configuration#

Dependencies#

Actions#

Module reference#

`tts.piper`#