assistant.vosk#
Description#
A voice assistant based on the Vosk offline speech recognition engine.
Vosk is a lightweight, offline speech recognition toolkit that supports multiple languages and runs on various platforms including Raspberry Pi.
Setup#
Install the plugin dependencies (
pip install vosk sounddevice).Either set the
langparameter (e.g.en,en-us,it,de) and the plugin will automatically download the best matching small model, or manually download a Vosk model from the Vosk models page and provide its path viamodel_path.
Models are stored by default under
<PLATYPUSH_WORKDIR>/assistant.vosk/models.
Hotword detection#
This plugin does not include built-in hotword detection. You can pair it
with a hotword detection plugin such as
platypush.plugins.assistant.picovoice.AssistantPicovoicePlugin
(with stt_enabled: false) or
platypush.plugins.assistant.openwakeword.AssistantOpenwakewordPlugin.
Example configuration with OpenWakeWord for hotword detection:
assistant.openwakeword: models: - hey_jarvis assistant.vosk: lang: en # auto-downloads a small en-us model # or: model_path: /path/to/vosk-model-en-us-0.22
Then trigger the conversation on hotword detection:
from platypush import run, when from platypush.message.event.assistant import HotwordDetectedEvent @when(HotwordDetectedEvent) def on_hotword_detected(): run("assistant.vosk.start_conversation")
Speech recognition#
When a conversation is started (either programmatically via
start_conversation() or after a hotword is detected), the plugin
records audio from the microphone and processes it through Vosk in
real-time. When speech is recognized, a
platypush.message.event.assistant.SpeechRecognizedEvent is fired.
You can hook into recognized speech:
from platypush import when, run from platypush.message.event.assistant import SpeechRecognizedEvent @when(SpeechRecognizedEvent, phrase='turn on (the)? lights?') def on_turn_on_lights(event: SpeechRecognizedEvent, **context): run("light.hue.on")
Configuration#
assistant.vosk:
# [Optional]
# Path to the Vosk model directory. You can download
# models from `<https://alphacephei.com/vosk/models>`_. Either
# ``model_path`` or ``lang`` must be specified.
# model_path: # type=str | None
# [Optional]
# Language code (e.g. ``en``, ``en-us``, ``it``, ``de``,
# ``fr``). If specified and ``model_path`` is not set, the plugin
# will automatically download the best matching small model from
# the Vosk model repository. Generic codes like ``en`` will match
# the most common regional variant (e.g. ``en-us``).
# lang: # type=str | None
# [Optional]
# Directory where downloaded models are stored.
# Default: ``<PLATYPUSH_WORKDIR>/assistant.vosk/models``.
# models_directory: # type=str | None
# [Optional]
# Audio sample rate in Hz (default: 16000). Most
# Vosk models expect 16 kHz audio.
# sample_rate: 16000 # type=int
# [Optional]
# Number of samples per audio frame (default: 4000).
# With the default sample rate of 16000, this corresponds to 250 ms
# per frame.
# frame_size: 4000 # type=int
# [Optional]
# Number of audio channels (default: 1). Vosk requires
# mono audio.
# channels: 1 # type=int
# [Optional]
# Seconds to wait for speech after
# starting a conversation before timing out (default: 5.0).
# conversation_start_timeout: 5.0 # type=float
# [Optional]
# Seconds of silence after the last
# detected speech before ending the conversation (default: 1.5).
# conversation_end_timeout: 1.5 # type=float
# [Optional]
# Maximum conversation duration in
# seconds (default: 15.0).
# conversation_max_duration: 15.0 # type=float
# [Optional]
# If True, include per-word timing and confidence
# information in the recognition results (default: False).
# words: False # type=bool
# [Optional]
# If set, the assistant will use this plugin (e.g.
# ``tts``, ``tts.google`` or ``tts.mimic3``) to render the responses,
# instead of using the built-in assistant voice.
# tts_plugin: # type=str | None
# [Optional]
# Optional arguments to be passed to the TTS
# ``say`` action, if ``tts_plugin`` is set.
# tts_plugin_args: # type=Dict[str, Any] | None
# [Optional]
# If set, the assistant will play this
# audio file when it detects a speech. The sound file will be played
# on the default audio output device. If not set, the assistant won't
# play any sound when it detects a speech.
# conversation_start_sound: # type=str | None
# [Optional]
# If set, the plugin will
# prevent the default assistant response when a
# `SpeechRecognizedEvent <https://docs.platypush.tech/platypush/events/assistant.html#platypush.message.event.assistant.SpeechRecognizedEvent>`_
# matches a user hook with a condition on a ``phrase`` field. This is
# useful to prevent the assistant from responding with a default "*I'm
# sorry, I can't help you with that*" when e.g. you say "*play the
# music*", and you have a hook that matches the phrase "*play the
# music*" and handles it with a custom action. If set, and you wish
# the assistant to also provide an answer if an event matches one of
# your hooks, then you should call the ``render_response`` method
# in your hook handler. If not set, then the assistant will always try
# and respond with a default message, even if a speech event matches
# the phrase of one of your hooks. In this case, if you want to prevent
# the default response, you should call ``stop_conversation``
# explicitly from your hook handler. Default: True.
# stop_conversation_on_speech_match: True # type=bool
# [Optional]
# How often the `RunnablePlugin.loop <https://docs.platypush.tech/platypush/plugins/.html#platypush.plugins.RunnablePlugin.loop>`_ function should be
# executed (default: 15 seconds). *NOTE*: For back-compatibility
# reasons, the `poll_seconds` argument is also supported, but it's
# deprecated.
# poll_interval: 15 # type=float | None
# [Optional]
# How long we should wait for any running
# threads/processes to stop before exiting (default: 5 seconds).
# stop_timeout: 5 # type=float | None
# [Optional]
# If set to True then the plugin will not monitor
# for new events. This is useful if you want to run a plugin in
# stateless mode and only leverage its actions, without triggering any
# events. Defaults to False.
# disable_monitor: False # type=bool
Dependencies#
pip
pip install sounddevice numpy platypush-vosk
Debian
apt install python3-numpy
Fedora
yum install python-numpy
Arch Linux
pacman -S python-sounddevice python-numpy
Triggered events#
Actions#
Module reference#
- class platypush.plugins.assistant.vosk.AssistantVoskPlugin(*_, **__)[source]#
Bases:
AssistantPlugin,RunnablePluginA voice assistant based on the Vosk offline speech recognition engine.
Vosk is a lightweight, offline speech recognition toolkit that supports multiple languages and runs on various platforms including Raspberry Pi.
Setup#
Install the plugin dependencies (
pip install vosk sounddevice).Either set the
langparameter (e.g.en,en-us,it,de) and the plugin will automatically download the best matching small model, or manually download a Vosk model from the Vosk models page and provide its path viamodel_path.
Models are stored by default under
<PLATYPUSH_WORKDIR>/assistant.vosk/models.Hotword detection#
This plugin does not include built-in hotword detection. You can pair it with a hotword detection plugin such as
platypush.plugins.assistant.picovoice.AssistantPicovoicePlugin(withstt_enabled: false) orplatypush.plugins.assistant.openwakeword.AssistantOpenwakewordPlugin.Example configuration with OpenWakeWord for hotword detection:
assistant.openwakeword: models: - hey_jarvis assistant.vosk: lang: en # auto-downloads a small en-us model # or: model_path: /path/to/vosk-model-en-us-0.22
Then trigger the conversation on hotword detection:
from platypush import run, when from platypush.message.event.assistant import HotwordDetectedEvent @when(HotwordDetectedEvent) def on_hotword_detected(): run("assistant.vosk.start_conversation")
Speech recognition#
When a conversation is started (either programmatically via
start_conversation()or after a hotword is detected), the plugin records audio from the microphone and processes it through Vosk in real-time. When speech is recognized, aplatypush.message.event.assistant.SpeechRecognizedEventis fired.You can hook into recognized speech:
from platypush import when, run from platypush.message.event.assistant import SpeechRecognizedEvent @when(SpeechRecognizedEvent, phrase='turn on (the)? lights?') def on_turn_on_lights(event: SpeechRecognizedEvent, **context): run("light.hue.on")
- __init__(model_path: str | None = None, *, lang: str | None = None, models_directory: str | None = None, sample_rate: int = 16000, frame_size: int = 4000, channels: int = 1, conversation_start_timeout: float = 5.0, conversation_end_timeout: float = 1.5, conversation_max_duration: float = 15.0, words: bool = False, **kwargs)[source]#
- Parameters:
model_path – Path to the Vosk model directory. You can download models from https://alphacephei.com/vosk/models. Either
model_pathorlangmust be specified.lang – Language code (e.g.
en,en-us,it,de,fr). If specified andmodel_pathis not set, the plugin will automatically download the best matching small model from the Vosk model repository. Generic codes likeenwill match the most common regional variant (e.g.en-us).models_directory – Directory where downloaded models are stored. Default:
<PLATYPUSH_WORKDIR>/assistant.vosk/models.sample_rate – Audio sample rate in Hz (default: 16000). Most Vosk models expect 16 kHz audio.
frame_size – Number of samples per audio frame (default: 4000). With the default sample rate of 16000, this corresponds to 250 ms per frame.
channels – Number of audio channels (default: 1). Vosk requires mono audio.
conversation_start_timeout – Seconds to wait for speech after starting a conversation before timing out (default: 5.0).
conversation_end_timeout – Seconds of silence after the last detected speech before ending the conversation (default: 1.5).
conversation_max_duration – Maximum conversation duration in seconds (default: 15.0).
words – If True, include per-word timing and confidence information in the recognition results (default: False).
- mute(*_, **__)[source]#
Note
This plugin has no continuous hotword detection. Speech processing is on-demand via
start_conversation()andstop_conversation(). Mute/unmute are no-ops.
- pause_detection(*_, **__)#
Put the assistant on pause. No new conversation events will be triggered.
- publish_entities(entities: Collection[Any] | None, callback: Callable[[Entity], Any] | None = None, **kwargs) Collection[Entity]#
Publishes a list of entities. The downstream consumers include:
The entity persistence manager
The web server
- Any consumer subscribed to
platypush.message.event.entities.EntityUpdateEventevents (e.g. web clients)
It also accepts an optional callback that will be called when each of the entities in the set is flushed to the database.
You usually don’t need to override this class (but you may want to extend
transform_entities()instead if your extension doesn’t natively handle Entity objects).
- render_response(text: str, *_, with_follow_on_turn: bool | None = None, **__) bool#
Render a response text as audio over the configured TTS plugin.
- Parameters:
text – Text to render.
with_follow_on_turn – If set, the assistant will wait for a follow-up. By default,
with_follow_on_turnwill be automatically set to true if thetextends with a question mark.
- Returns:
True if the assistant is waiting for a follow-up, False otherwise.
- resume_detection(*_, **__)#
Resume the assistant hotword detection from a paused state.
- send_text_query(*_, query: str, **__)[source]#
Send a text query to the assistant (emulates speech recognition).
- Parameters:
query – The text query to process.
- start_conversation(*_, **__)[source]#
Start a conversation with the assistant.
The conversation will be automatically stopped after
conversation_max_durationseconds, or afterconversation_start_timeoutseconds of silence with no speech detected, or afterconversation_end_timeoutseconds of silence after the last speech, or whenstop_conversation()is called.
- status(*_, **__)#
- Returns:
The current assistant status:
{ "last_query": "What time is it?", "last_response": "It's 10:30 AM", "conversation_running": true, "is_muted": false, "is_detecting": true }
- stop_conversation(*_, **__)#
Programmatically stops a conversation.
- toggle_mute(*_, **__)#
Toggle the mute state of the microphone.
- transform_entities(entities: Collection[AssistantPlugin], **_)#
This method takes a list of entities in any (plugin-specific) format and converts them into a standardized collection of Entity objects. Since this method is called by
publish_entities()before entity updates are published, you may usually want to extend it to pre-process the entities managed by your extension into the standard format before they are stored and published to all the consumers.
- unmute(*_, **__)[source]#
Note
This plugin has no continuous hotword detection. Speech processing is on-demand via
start_conversation()andstop_conversation(). Mute/unmute are no-ops.