`stt.picovoice.speech`#

class platypush.plugins.stt.picovoice.speech.SttPicovoiceSpeechPlugin(library_path: str | None = None, acoustic_model_path: str | None = None, language_model_path: str | None = None, license_path: str | None = None, end_of_speech_timeout: int = 1, *args, **kwargs)[source]#

Bases: SttPlugin

This plugin performs speech detection using PicoVoice. NOTE: The PicoVoice product used for real-time speech-to-text (Cheetah) can be used freely for personal applications on x86_64 Linux. Other architectures and operating systems require a commercial license. You can ask for a license here.

Requires:

cheetah (pip install git+https://github.com/BlackLight/cheetah)

__init__(library_path: str | None = None, acoustic_model_path: str | None = None, language_model_path: str | None = None, license_path: str | None = None, end_of_speech_timeout: int = 1, *args, **kwargs)[source]#

Parameters:

library_path – Path to the Cheetah binary library for your OS (default: CHEETAH_INSTALL_DIR/lib/OS/ARCH/libpv_cheetah.EXT).
acoustic_model_path – Path to the acoustic speech model (default: CHEETAH_INSTALL_DIR/lib/common/acoustic_model.pv).
language_model_path – Path to the language model (default: CHEETAH_INSTALL_DIR/lib/common/language_model.pv).
license_path – Path to your PicoVoice license (default: CHEETAH_INSTALL_DIR/resources/license/cheetah_eval_linux_public.lic).
end_of_speech_timeout – Number of seconds of silence during speech recognition before considering a phrase over (default: 1).

convert_frames(frames: bytes) → tuple[source]#

Conversion method for raw audio frames. It just returns the input frames as bytes. Override it if required by your logic.

Parameters:: frames – Input audio frames, as bytes.
Returns:: The audio frames as passed on the input. Override if required.

detect(audio_file: str) → SpeechDetectedResponse[source]#

Perform speech-to-text analysis on an audio file.

Parameters:: audio_file – Path to the audio file.

detect_speech(frames: tuple) → str[source]#

Method called within the detection_thread when new audio frames have been captured. Must be implemented by the derived classes.

Parameters:: frames – Audio frames, as returned by convert_frames.
Returns:: Detected text, as a string. Returns an empty string if no text has been detected.

on_detection_ended() → None[source]#: Method called when the detection_thread stops. Clean up your context variables and models here.

recording_thread(input_device: str | None = None, *args, **kwargs) → None[source]#

Recording thread. It reads raw frames from the audio device and dispatches them to detection_thread.

Parameters:

block_duration – Audio blocks duration. Specify either block_duration or block_size.
block_size – Size of the audio blocks. Specify either block_duration or block_size.
input_device – Input device

start_detection(*args, **kwargs) → None[source]#

Start the speech detection engine.

Parameters:

input_device – Audio input device name/index override
seconds – If set, then the detection engine will stop after this many seconds, otherwise it’ll start running until stop_detection is called or application stop.
block_duration – block_duration override.

stt.picovoice.speech

Platypush documentation

stt.picovoice.speech#

`stt.picovoice.speech`#