stt.deepspeech
Platypush documentation
stt.deepspeech
- class platypush.plugins.stt.deepspeech.SttDeepspeechPlugin(model_file: str, lm_file: str, trie_file: str, lm_alpha: float = 0.75, lm_beta: float = 1.85, beam_width: int = 500, *args, **kwargs)[source]
Bases:
SttPlugin
This plugin performs speech-to-text and speech detection using the Mozilla DeepSpeech engine.
Requires:
deepspeech (
pip install 'deepspeech>=0.6.0'
)numpy (
pip install numpy
)sounddevice (
pip install sounddevice
)
- __init__(model_file: str, lm_file: str, trie_file: str, lm_alpha: float = 0.75, lm_beta: float = 1.85, beam_width: int = 500, *args, **kwargs)[source]
In order to run the speech-to-text engine you’ll need to download the right model files for the Deepspeech engine that you have installed:
# Create the working folder for the models export MODELS_DIR=~/models mkdir -p $MODELS_DIR cd $MODELS_DIR # Download and extract the model files for your version of Deepspeech. This may take a while. export DEEPSPEECH_VERSION=0.6.1 wget https://github.com/mozilla/DeepSpeech/releases/download/v$DEEPSPEECH_VERSION/deepspeech-$DEEPSPEECH_VERSION-models.tar.gz tar -xvzf deepspeech-$DEEPSPEECH_VERSION-models.tar.gz x deepspeech-0.6.1-models/ x deepspeech-0.6.1-models/lm.binary x deepspeech-0.6.1-models/output_graph.pbmm x deepspeech-0.6.1-models/output_graph.pb x deepspeech-0.6.1-models/trie x deepspeech-0.6.1-models/output_graph.tflite
- Parameters:
model_file – Path to the model file (usually named
output_graph.pb
oroutput_graph.pbmm
). Note that.pbmm
usually perform better and are smaller.lm_file – Path to the language model binary file (usually named
lm.binary
).trie_file – The path to the trie file build from the same vocabulary as the language model binary (usually named
trie
).lm_alpha – The alpha hyperparameter of the CTC decoder - Language Model weight. See <https://github.com/mozilla/DeepSpeech/releases/tag/v0.6.0>.
lm_beta – The beta hyperparameter of the CTC decoder - Word Insertion weight. See <https://github.com/mozilla/DeepSpeech/releases/tag/v0.6.0>.
beam_width – Decoder beam width (see beam scoring in KenLM language model).
input_device – PortAudio device index or name that will be used for recording speech (default: default system audio input device).
hotword – When this word is detected, the plugin will trigger a
platypush.message.event.stt.HotwordDetectedEvent
instead of aplatypush.message.event.stt.SpeechDetectedEvent
event. You can use these events for hooking other assistants.hotwords – Use a list of hotwords instead of a single one.
conversation_timeout – If
hotword
orhotwords
are set andconversation_timeout
is set, the next speech detected event will trigger aplatypush.message.event.stt.ConversationDetectedEvent
instead of aplatypush.message.event.stt.SpeechDetectedEvent
event. You can hook custom hooks here to run any logic depending on the detected speech - it can emulate a kind of “OK, Google. Turn on the lights” interaction without using an external assistant.block_duration – Duration of the acquired audio blocks (default: 1 second).
- static convert_frames(frames: Union[numpy.ndarray, bytes]) numpy.ndarray [source]
Conversion method for raw audio frames. It just returns the input frames as bytes. Override it if required by your logic.
- Parameters:
frames – Input audio frames, as bytes.
- Returns:
The audio frames as passed on the input. Override if required.
- detect(audio_file: str) SpeechDetectedResponse [source]
Perform speech-to-text analysis on an audio file.
- Parameters:
audio_file – Path to the audio file.
- detect_speech(frames) str [source]
Method called within the
detection_thread
when new audio frames have been captured. Must be implemented by the derived classes.- Parameters:
frames – Audio frames, as returned by
convert_frames
.- Returns:
Detected text, as a string. Returns an empty string if no text has been detected.
- on_detection_ended()[source]
Method called when the
detection_thread
stops. Clean up your context variables and models here.