`stt.deepspeech`#

class platypush.plugins.stt.deepspeech.SttDeepspeechPlugin(model_file: str, lm_file: str, trie_file: str, lm_alpha: float = 0.75, lm_beta: float = 1.85, beam_width: int = 500, *args, **kwargs)[source]#

Bases: SttPlugin

This plugin performs speech-to-text and speech detection using the Mozilla DeepSpeech engine.

Requires:

deepspeech (pip install 'deepspeech>=0.6.0')

numpy (pip install numpy)

sounddevice (pip install sounddevice)

__init__(model_file: str, lm_file: str, trie_file: str, lm_alpha: float = 0.75, lm_beta: float = 1.85, beam_width: int = 500, *args, **kwargs)[source]#

In order to run the speech-to-text engine you’ll need to download the right model files for the Deepspeech engine that you have installed:

# Create the working folder for the models
export MODELS_DIR=~/models
mkdir -p $MODELS_DIR
cd $MODELS_DIR

# Download and extract the model files for your version of Deepspeech. This may take a while.
export DEEPSPEECH_VERSION=0.6.1
wget https://github.com/mozilla/DeepSpeech/releases/download/v$DEEPSPEECH_VERSION/deepspeech-$DEEPSPEECH_VERSION-models.tar.gz
tar -xvzf deepspeech-$DEEPSPEECH_VERSION-models.tar.gz
x deepspeech-0.6.1-models/
x deepspeech-0.6.1-models/lm.binary
x deepspeech-0.6.1-models/output_graph.pbmm
x deepspeech-0.6.1-models/output_graph.pb
x deepspeech-0.6.1-models/trie
x deepspeech-0.6.1-models/output_graph.tflite

Parameters:

model_file – Path to the model file (usually named output_graph.pb or output_graph.pbmm). Note that .pbmm usually perform better and are smaller.
lm_file – Path to the language model binary file (usually named lm.binary).
trie_file – The path to the trie file build from the same vocabulary as the language model binary (usually named trie).
lm_alpha – The alpha hyperparameter of the CTC decoder - Language Model weight. See <mozilla/DeepSpeech>.
lm_beta – The beta hyperparameter of the CTC decoder - Word Insertion weight. See <mozilla/DeepSpeech>.
beam_width – Decoder beam width (see beam scoring in KenLM language model).
input_device – PortAudio device index or name that will be used for recording speech (default: default system audio input device).
hotword – When this word is detected, the plugin will trigger a platypush.message.event.stt.HotwordDetectedEvent instead of a platypush.message.event.stt.SpeechDetectedEvent event. You can use these events for hooking other assistants.
hotwords – Use a list of hotwords instead of a single one.
conversation_timeout – If hotword or hotwords are set and conversation_timeout is set, the next speech detected event will trigger a platypush.message.event.stt.ConversationDetectedEvent instead of a platypush.message.event.stt.SpeechDetectedEvent event. You can hook custom hooks here to run any logic depending on the detected speech - it can emulate a kind of “OK, Google. Turn on the lights” interaction without using an external assistant.
block_duration – Duration of the acquired audio blocks (default: 1 second).

static convert_frames(frames: numpy.ndarray | bytes) → numpy.ndarray[source]#

Conversion method for raw audio frames. It just returns the input frames as bytes. Override it if required by your logic.

Parameters:: frames – Input audio frames, as bytes.
Returns:: The audio frames as passed on the input. Override if required.

detect(audio_file: str) → SpeechDetectedResponse[source]#

Perform speech-to-text analysis on an audio file.

Parameters:: audio_file – Path to the audio file.

detect_speech(frames) → str[source]#

Method called within the detection_thread when new audio frames have been captured. Must be implemented by the derived classes.

Parameters:: frames – Audio frames, as returned by convert_frames.
Returns:: Detected text, as a string. Returns an empty string if no text has been detected.

on_detection_ended()[source]#: Method called when the detection_thread stops. Clean up your context variables and models here.

on_detection_started()[source]#: Method called when the detection_thread starts. Initialize your context variables and models here if required.

on_speech_detected(speech: str) → None[source]#

Hook called when speech is detected. Triggers the right event depending on the current context.

Parameters:: speech – Detected speech.

stt.deepspeech

Platypush documentation

stt.deepspeech#

`stt.deepspeech`#