tts.piper#

Description#

Text-to-speech plugin that uses Piper, a fast and local neural text-to-speech engine.

Install with:

$ pip install piper-tts

You will also need to download at least one voice model. You can do so via the download_voice() action.

Voice models are typically stored in ~/.local/share/piper_tts.

The full list of supported voice models is here.

param kwargs:

Extra arguments to be passed to the TtsPlugin constructor.

Configuration#

tts.piper:
  # [Optional]
  # Path to the Piper ``.onnx`` default voice model file, or
  # model name (e.g. ``en_US-hfc_female-medium``) relative to
  # ``models_dir``.
  # If not specified, it must be specified when `TtsPiperPlugin.say <https://docs.platypush.tech/platypush/plugins/tts.piper.html#platypush.plugins.tts.piper.TtsPiperPlugin.say>`_ is called,
  # or a model should be downloaded via `TtsPiperPlugin.download_voice <https://docs.platypush.tech/platypush/plugins/tts.piper.html#platypush.plugins.tts.piper.TtsPiperPlugin.download_voice>`_.
  # model:   # type=str | None

  # [Optional]
  # Directory where Piper voice models are stored.
  # Default: ``<WORKDIR>/piper_tts``.
  # models_dir:   # type=str | None

  # [Optional]
  # Default speaker ID for multi-speaker models
  # (default: None).
  # speaker_id:   # type=int | None

  # [Optional]
  # Default speaking speed scale. Higher values make
  # speech slower (default: voice default, typically 1.0).
  # length_scale:   # type=float | None

  # [Optional]
  # Default audio variation / expressiveness scale
  # (default: voice default).
  # noise_scale:   # type=float | None

  # [Optional]
  # Default phoneme width variation scale
  # (default: voice default).
  # noise_w_scale:   # type=float | None

  # [Optional]
  # Whether to use CUDA for GPU acceleration. Requires
  # ``onnxruntime-gpu`` to be installed (default: False).
  # use_cuda: False  # type=bool

  # [Optional]
  # Silence, in seconds, to prepend before playing
  # the audio. This gives the audio backend (e.g. PulseAudio/PipeWire)
  # time to initialize the output path, avoiding the first fraction of
  # generated speech being silently dropped (default: 1).
  # start_padding: 1  # type=float

  # [Optional]
  # Silence, in seconds, to append before closing the
  # playback stream. This avoids clipping the tail of short generated
  # speech on some audio backends (default: 1).
  # end_padding: 1  # type=float

  # [Optional]
  # Language code (default: ``en-US``).
  # language: en-US

Dependencies#

pip

pip install sounddevice numpy piper-tts

Alpine

apk add ffmpeg py3-numpy portaudio-dev

Debian

apt install ffmpeg portaudio19-dev python3-numpy

Fedora

yum install ffmpeg portaudio-devel python-numpy

Arch Linux

pacman -S ffmpeg python-sounddevice portaudio python-numpy

Actions#

Module reference#

class platypush.plugins.tts.piper.TtsPiperPlugin(*, model: str | None = None, models_dir: str | None = None, speaker_id: int | None = None, length_scale: float | None = None, noise_scale: float | None = None, noise_w_scale: float | None = None, use_cuda: bool = False, start_padding: float = 1, end_padding: float = 1, **kwargs)[source]#

Bases: TtsPlugin

Text-to-speech plugin that uses Piper, a fast and local neural text-to-speech engine.

Install with:

$ pip install piper-tts

You will also need to download at least one voice model. You can do so via the download_voice() action.

Voice models are typically stored in ~/.local/share/piper_tts.

The full list of supported voice models is here.

__init__(*, model: str | None = None, models_dir: str | None = None, speaker_id: int | None = None, length_scale: float | None = None, noise_scale: float | None = None, noise_w_scale: float | None = None, use_cuda: bool = False, start_padding: float = 1, end_padding: float = 1, **kwargs)[source]#
Parameters:
  • model – Path to the Piper .onnx default voice model file, or model name (e.g. en_US-hfc_female-medium) relative to models_dir. If not specified, it must be specified when say() is called, or a model should be downloaded via download_voice().

  • models_dir – Directory where Piper voice models are stored. Default: <WORKDIR>/piper_tts.

  • speaker_id – Default speaker ID for multi-speaker models (default: None).

  • length_scale – Default speaking speed scale. Higher values make speech slower (default: voice default, typically 1.0).

  • noise_scale – Default audio variation / expressiveness scale (default: voice default).

  • noise_w_scale – Default phoneme width variation scale (default: voice default).

  • use_cuda – Whether to use CUDA for GPU acceleration. Requires onnxruntime-gpu to be installed (default: False).

  • start_padding – Silence, in seconds, to prepend before playing the audio. This gives the audio backend (e.g. PulseAudio/PipeWire) time to initialize the output path, avoiding the first fraction of generated speech being silently dropped (default: 1).

  • end_padding – Silence, in seconds, to append before closing the playback stream. This avoids clipping the tail of short generated speech on some audio backends (default: 1).

  • kwargs – Extra arguments to be passed to the platypush.plugins.tts.TtsPlugin constructor.

download_voice(voice: str, models_dir: str | None = None)[source]#

Download a Piper voice model.

Parameters:
  • voice – Name of the voice to download (e.g. en_US-lessac-medium).

  • models_dir – Directory to store the downloaded voice model (default: the configured models_dir).

say(text: str, *_, model: str | None = None, speaker_id: int | None = None, length_scale: float | None = None, noise_scale: float | None = None, noise_w_scale: float | None = None, output_file: str | None = None, **player_args)[source]#

Say some text.

Parameters:
  • text – Text to say.

  • model – Override the default voice model.

  • speaker_id – Speaker ID override for multi-speaker models.

  • length_scale – Speaking speed override.

  • noise_scale – Audio variation override.

  • noise_w_scale – Phoneme width variation override.

  • output_file – If set, save the audio to the specified file instead of playing it.

  • player_args – Extends the additional arguments to be passed to platypush.plugins.sound.SoundPlugin.play() (like volume, duration, channels etc.).

stop()#

Stop the playback.