assistant.picovoice#

Description#

A voice assistant that runs on your device, based on the Picovoice engine.

Picovoice is a suite of on-device voice technologies that include:

  • Porcupine: wake-word engine, if you want the device to listen for a specific wake word in order to start the assistant.

  • Cheetah: speech-to-text engine, if you want your voice interactions to be transcribed into free text - either programmatically or when triggered by the wake word. Or:

  • Rhino: intent recognition engine, if you want to extract intents out of your voice commands - for instance, the phrase “set the living room temperature to 20 degrees” could be mapped to the intent with the following parameters: intent: set_temperature, room: living_room, temperature: 20.

  • Leopard: speech-to-text engine aimed at offline transcription of audio files rather than real-time transcription.

  • Orca: text-to-speech engine, if you want to create your custom logic to respond to user’s voice commands and render the responses as audio.

This plugin is a wrapper around the Picovoice engine that allows you to run your custom voice-based conversational flows on your device.

Getting a Picovoice account and access key#

You can get your personal access key by signing up at the Picovoice console. You may be asked to submit a reason for using the service (feel free to mention a personal Platypush integration), and you will receive your personal access key.

If prompted to select the products you want to use, make sure to select the ones from the Picovoice suite that you want to use with this plugin.

Hotword detection#

The hotword detection engine is based on Porcupine.

If enabled through the hotword_enabled parameter (default: True), the assistant will listen for a specific wake word before starting the speech-to-text or intent recognition engines. You can specify custom models for your hotword (e.g. on the same device you may use “Alexa” to trigger the speech-to-text engine in English, “Computer” to trigger the speech-to-text engine in Italian, and “Ok Google” to trigger the intent recognition engine.

You can also create your custom hotword models using the Porcupine console.

If hotword_enabled is set to True, you must also specify the keywords parameter with the list of keywords that you want to listen for, and optionally the keyword_paths parameter with the paths to the any custom hotword models that you want to use. If hotword_enabled is set to False, then the assistant won’t start listening for speech after the plugin is started, and you will need to programmatically start the conversation by calling the start_conversation() action, or trigger it from the UI.

When a wake-word is detected, the assistant will emit a platypush.message.event.assistant.HotwordDetectedEvent event that you can use to build your custom logic. For example:

import time

from platypush import when, run
from platypush.message.event.assistant import HotwordDetectedEvent

# Turn on a light for 5 seconds when the hotword "Alexa" is detected
@when(HotwordDetectedEvent, hotword='Alexa')
def on_hotword_detected(event: HotwordDetectedEvent, **context):
    run("light.hue.on", lights=["Living Room"])
    time.sleep(5)
    run("light.hue.off", lights=["Living Room"])

By default, the assistant will start listening for speech after the hotword if either stt_enabled or intent_model_path are set. If you don’t want the assistant to start listening for speech after the hotword is detected (for example because you want to build your custom response flows, or trigger the speech detection using different models depending on the hotword that is used, or because you just want to detect hotwords but not speech), then you can also set the start_conversation_on_hotword parameter to False. If that is the case, then you can programmatically start the conversation by calling the start_conversation() method in your event hooks:

from platypush import when, run
from platypush.message.event.assistant import HotwordDetectedEvent

# Start a conversation using the Italian language model when the
# "Buongiorno" hotword is detected
@when(HotwordDetectedEvent, hotword='Buongiorno')
def on_it_hotword_detected(event: HotwordDetectedEvent, **context):
    event.assistant.start_conversation(model_file='path/to/it.pv')

Speech-to-text#

The speech-to-text engine is based on Cheetah.

If enabled through the stt_enabled parameter (default: True), the assistant will transcribe the voice commands into text when a conversation is started either programmatically through start_conversation() or when the hotword is detected.

It will emit a platypush.message.event.assistant.SpeechRecognizedEvent when some speech is detected, and you can hook to that event to build your custom logic:

from platypush import when, run
from platypush.message.event.assistant import SpeechRecognizedEvent

# Turn on a light when the phrase "turn on the lights" is detected.
# Note that we can leverage regex-based pattern matching to be more
# flexible when matching the phrases. For example, the following hook
# will be matched when the user says "turn on the lights", "turn on
# lights", "lights on", "lights on please", "turn on light" etc.
@when(SpeechRecognizedEvent, phrase='turn on (the)? lights?')
def on_turn_on_lights(event: SpeechRecognizedEvent, **context):
    run("light.hue.on")

You can also leverage context extraction through the ${} syntax on the hook to extract specific tokens from the event that can be passed to your event hook. For example:

from platypush import when, run
from platypush.message.event.assistant import SpeechRecognizedEvent

@when(SpeechRecognizedEvent, phrase='play ${title} by ${artist}')
def on_play_track_command(
    event: SpeechRecognizedEvent, title: str, artist: str, **context
):
    results = run(
        "music.mopidy.search",
        filter={"title": title, "artist": artist}
    )

    if not results:
        event.assistant.render_response(f"Couldn't find {title} by {artist}")
        return

    run("music.mopidy.play", resource=results[0]["uri"])

Speech-to-intent#

The intent recognition engine is based on Rhino.

Intents are snippets of unstructured transcribed speech that can be matched to structured actions.

Unlike with hotword and speech-to-text detection, you need to provide a custom model for intent detection. You can create your custom model using the Rhino console.

When an intent is detected, the assistant will emit a platypush.message.event.assistant.IntentRecognizedEvent that can be listened.

For example, you can train a model to control groups of smart lights by defining the following slots on the Rhino console:

  • device_state: The new state of the device (e.g. with on or off as supported values)

  • room: The name of the room associated to the group of lights to be controlled (e.g. living room, kitchen, bedroom)

You can then define a lights_ctrl intent with the following expressions:

  • “turn $device_state:state the lights”

  • “turn $device_state:state the $room:room lights”

  • “turn the lights $device_state:state

  • “turn the $room:room lights $device_state:state

  • “turn $room:room lights $device_state:state

This intent will match any of the following phrases:

  • turn on the lights

  • turn off the lights

  • turn the lights on

  • turn the lights off

  • turn on the living room lights

  • turn off the living room lights

  • turn the living room lights on

  • turn the living room lights off

And it will extract any slots that are matched in the phrases in the platypush.message.event.assistant.IntentRecognizedEvent event.

Train the model, download the context file, and pass the path on the intent_model_path parameter.

You can then register a hook to listen to a specific intent:

from platypush import when, run
from platypush.message.event.assistant import IntentRecognizedEvent

@when(IntentRecognizedEvent, intent='lights_ctrl', slots={'state': 'on'})
def on_turn_on_lights(event: IntentRecognizedEvent, **context):
    room = event.slots.get('room')
    if room:
        run("light.hue.on", groups=[room])
    else:
        run("light.hue.on")

Note that if both stt_enabled and intent_model_path are set, then both the speech-to-text and intent recognition engines will run in parallel when a conversation is started.

The intent engine is usually faster, as it has a smaller set of intents to match and doesn’t have to run a full speech-to-text transcription. This means that, if an utterance matches both a speech-to-text phrase and an intent, the platypush.message.event.assistant.IntentRecognizedEvent event is emitted (and not platypush.message.event.assistant.SpeechRecognizedEvent).

This may not be always the case though. So it may be a good practice to also provide a fallback platypush.message.event.assistant.SpeechRecognizedEvent hook to catch the text if the speech is not recognized as an intent:

from platypush import when, run
from platypush.message.event.assistant import SpeechRecognizedEvent

@when(SpeechRecognizedEvent, phrase='turn ${state} (the)? ${room} lights?')
def on_turn_on_lights(event: SpeechRecognizedEvent, phrase, room, **context):
    if room:
        run("light.hue.on", groups=[room])
    else:
        run("light.hue.on")

Text-to-speech#

The text-to-speech engine is based on Orca.

It is not directly implemented by this plugin, but the implementation is provided in the platypush.plugins.tts.picovoice.TtsPicovoicePlugin plugin.

You can however leverage the render_response() action to render some text as speech in response to a user command, and that in turn will leverage the PicoVoice TTS plugin to render the response.

For example, the following snippet provides a hook that:

import re
from collections import defaultdict
from datetime import datetime as dt, timedelta
from dateutil.parser import isoparse
from logging import getLogger

from platypush import hook, run
from platypush.message.event.assistant import (
    SpeechRecognizedEvent,
    ResponseEndEvent,
)

logger = getLogger(__name__)

def play_music(*_, **__):
    run("music.mopidy.play")

def stop_music(*_, **__):
    run("music.mopidy.stop")

def ai_assist(event: SpeechRecognizedEvent, **__):
    response = run("openai.get_response", prompt=event.phrase)
    if not response:
        return

    run("assistant.picovoice.render_response", text=response)

# List of commands to match, as pairs of regex patterns and the
# corresponding actions
hooks = (
    (re.compile(r"play (the)?music", re.IGNORECASE), play_music),
    (re.compile(r"stop (the)?music", re.IGNORECASE), stop_music),
    # Fallback to the AI assistant
    (re.compile(r".*"), ai_assist),
)

@when(SpeechRecognizedEvent)
def on_speech_recognized(event, **kwargs):
    for pattern, command in hooks:
        if pattern.search(event.phrase):
            logger.info("Running voice command %s", command.__name__)
            command(event, **kwargs)
            break

@when(ResponseEndEvent)
def on_response_end(event: ResponseEndEvent, **__):
    # Check if the response is a question and start a follow-on turn if so.
    # Note that the ``openai`` plugin by default is configured to keep
    # the past interaction in a context window of ~10 minutes, so you
    # can follow up like in a real conversation.
    if event.assistant and event.response_text and event.response_text.endswith("?"):
        event.assistant.start_conversation()

Configuration#

assistant.picovoice:
  # [Required]
  # Your Picovoice access key. You can get it by signing
  # up at the `Picovoice console <https://console.picovoice.ai/>`.
  access_key:   # type=str

  # [Optional]
  # Enable the wake-word engine (default: True).
  # **Note**: The wake-word engine requires you to add Porcupine to the
  # products available in your Picovoice account.
  # hotword_enabled: True  # type=bool

  # [Optional]
  # Enable the speech-to-text engine (default: True).
  # **Note**: The speech-to-text engine requires you to add Cheetah to
  # the products available in your Picovoice account.
  # stt_enabled: True  # type=bool

  # [Optional]
  # List of keywords to listen for (e.g. ``alexa``, ``ok
  # google``...). This is required if the wake-word engine is enabled.
  # See the `Porcupine keywords repository
  # <https://github.com/Picovoice/porcupine/tree/master/resources/keyword_files>`_).
  # for a list of the stock keywords available. If you have a custom
  # model, you can pass its path to the ``keyword_paths`` parameter and
  # its filename (without the path and the platform extension) here.
  # keywords:   # type=Optional[Sequence[str]]

  # [Optional]
  # List of paths to the keyword files to listen for.
  # Custom keyword files can be created using the `Porcupine console
  # <https://console.picovoice.ai/ppn>`_ and downloaded from the
  # console itself.
  # keyword_paths:   # type=Optional[Sequence[str]]

  # [Optional]
  # If you are using a keyword file in a
  # non-English language, you can provide the path to the model file
  # for its language. Model files are available for all the supported
  # languages through the `Porcupine lib repository
  # <https://github.com/Picovoice/porcupine/tree/master/lib/common>`_.
  # keyword_model_path:   # type=Optional[str]

  # [Optional]
  # Path to the speech model file. If you are
  # using a language other than English, you can provide the path to the
  # model file for that language. Model files are available for all the
  # supported languages through the `Cheetah repository
  # <https://github.com/Picovoice/cheetah/tree/master/lib/common>`_.
  # You can also use the `Speech console
  # <https://console.picovoice.ai/cat>`_
  # to train your custom models. You can use a base model and fine-tune
  # it by boosting the detection of your own words and phrases and edit
  # the phonetic representation of the words you want to detect.
  # speech_model_path:   # type=Optional[str]

  # [Optional]
  # Path to the Rhino context model. This is
  # required if you want to use the intent recognition engine through
  # Rhino. The context model is a file that contains a list of intents
  # that can be recognized by the engine. An intent is an action or a
  # class of actions that the assistant can recognize, and it can
  # contain an optional number of slots to model context variables -
  # e.g. temperature, lights group, location, device state etc.
  # You can create your own context model using the `Rhino console
  # <https://console.picovoice.ai/rhn>`_. For example, you can define a
  # context file to control smart home devices by defining the
  # following slots:
  #
  #     - ``device_type``: The device to control (e.g. lights, music)
  #     - ``device_state``: The target state of the device (e.g. on,
  #       off)
  #     - ``location``: The location of the device (e.g. living
  #       room, kitchen, bedroom)
  #     - ``media_type``: The type of media to play (e.g. music, video)
  #     - ``media_state``: The state of the media (e.g. play, pause,
  #       stop)
  #
  # You can then define the following intents:
  #
  #     - ``device_ctrl``: Control a device state. Supported phrases:
  #         - "turn ``$device_state:state`` the ``$location:location``
  #           ``$device_type:device``"
  #         - "turn ``$device_state:state`` the ``$device_type:device``"
  #
  #     - ``media_ctrl``: Control media state. Supported phrases:
  #         - "``$media_state:state`` the ``$media_type:media``"
  #         - "``$media_state:state`` the ``$media_type:media`` in the
  #           ``$location:location``"
  #
  # Then a phrase like "turn on the lights in the living room" would
  # trigger a
  # `IntentRecognizedEvent <https://docs.platypush.tech/platypush/events/assistant.html#platypush.message.event.assistant.IntentRecognizedEvent>`_ with:
  #
  #     .. code-block:: json
  #
  #       {
  #         "intent": "device_ctrl",
  #         "slots": {
  #           "type": "lights",
  #           "state": "on",
  #           "location": "living room"
  #         }
  #       }
  #
  # **Note**: The intent recognition engine requires you to add Rhino
  # to the products available in your Picovoice account.
  # intent_model_path:   # type=Optional[str]

  # [Optional]
  # If set, the assistant will stop listening when
  # no speech is detected for the specified duration (in seconds) after
  # the end of an utterance.
  # endpoint_duration: 0.5  # type=Optional[float]

  # [Optional]
  # Enable automatic punctuation
  # insertion.
  # enable_automatic_punctuation: False  # type=bool

  # [Optional]
  # If set to True (default), a speech
  # detection session will be started when the hotword is detected. If
  # set to False, you may want to start the conversation programmatically
  # by calling the `AssistantPicovoicePlugin.start_conversation <https://docs.platypush.tech/platypush/plugins/assistant.picovoice.html#platypush.plugins.assistant.picovoice.AssistantPicovoicePlugin.start_conversation>`_ method instead, or run any
  # custom logic hotword detection logic. This can be particularly useful
  # when you want to run the assistant in a push-to-talk mode, or when you
  # want different hotwords to trigger conversations with different models
  # or languages.
  # start_conversation_on_hotword: True  # type=bool

  # [Optional]
  # Maximum number of audio frames to hold in the
  # processing queue. You may want to increase this value if you are
  # running this integration on a slow device and/or the logs report
  # audio frame drops too often. Keep in mind that increasing this value
  # will increase the memory usage of the integration. Also, a higher
  # value may result in higher accuracy at the cost of higher latency.
  # audio_queue_size: 100  # type=int

  # [Optional]
  # Maximum time to wait for some speech to be
  # detected after the hotword is detected. If no speech is detected
  # within this time, the conversation will time out and the plugin will
  # go back into hotword detection mode, if the mode is enabled. Default:
  # 7.5 seconds.
  # conversation_timeout: 7.5  # type=Optional[float]

  # [Optional]
  # Set to True to start the assistant in a muted state. You will
  # need to call the `AssistantPicovoicePlugin.unmute <https://docs.platypush.tech/platypush/plugins/assistant.picovoice.html#platypush.plugins.assistant.picovoice.AssistantPicovoicePlugin.unmute>`_ method to start the assistant listening
  # for commands, or programmatically call the `AssistantPicovoicePlugin.start_conversation <https://docs.platypush.tech/platypush/plugins/assistant.picovoice.html#platypush.plugins.assistant.picovoice.AssistantPicovoicePlugin.start_conversation>`_
  # to start a conversation.
  # muted: False  # type=bool

  # [Optional]
  # If set, the assistant will use this plugin (e.g.
  # ``tts``, ``tts.google`` or ``tts.mimic3``) to render the responses,
  # instead of using the built-in assistant voice.
  # tts_plugin:   # type=Optional[str]

  # [Optional]
  # Optional arguments to be passed to the TTS
  # ``say`` action, if ``tts_plugin`` is set.
  # tts_plugin_args:   # type=Optional[Dict[str, Any]]

  # [Optional]
  # If set, the assistant will play this
  # audio file when it detects a speech. The sound file will be played
  # on the default audio output device. If not set, the assistant won't
  # play any sound when it detects a speech.
  # conversation_start_sound:   # type=Optional[str]

  # [Optional]
  # If set, the plugin will
  # prevent the default assistant response when a
  # `SpeechRecognizedEvent <https://docs.platypush.tech/platypush/events/assistant.html#platypush.message.event.assistant.SpeechRecognizedEvent>`_
  # matches a user hook with a condition on a ``phrase`` field. This is
  # useful to prevent the assistant from responding with a default "*I'm
  # sorry, I can't help you with that*" when e.g. you say "*play the
  # music*", and you have a hook that matches the phrase "*play the
  # music*" and handles it with a custom action. If set, and you wish
  # the assistant to also provide an answer if an event matches one of
  # your hooks, then you should call the ``render_response`` method
  # in your hook handler. If not set, then the assistant will always try
  # and respond with a default message, even if a speech event matches
  # the phrase of one of your hooks. In this case, if you want to prevent
  # the default response, you should call ``stop_conversation``
  # explicitly from your hook handler. Default: True.
  # stop_conversation_on_speech_match: True  # type=bool

  # [Optional]
  # How often the `RunnablePlugin.loop <https://docs.platypush.tech/platypush/plugins/.html#platypush.plugins.RunnablePlugin.loop>`_ function should be
  # executed (default: 15 seconds). *NOTE*: For back-compatibility
  # reasons, the `poll_seconds` argument is also supported, but it's
  # deprecated.
  # poll_interval: 15  # type=Optional[float]

  # [Optional]
  # How long we should wait for any running
  # threads/processes to stop before exiting (default: 5 seconds).
  # stop_timeout: 5  # type=Optional[float]

  # [Optional]
  # If set to True then the plugin will not monitor
  # for new events. This is useful if you want to run a plugin in
  # stateless mode and only leverage its actions, without triggering any
  # events. Defaults to False.
  # disable_monitor: False  # type=bool

Dependencies#

pip

pip install num2words pvcheetah pvrhino pvleopard pvorca pvporcupine sounddevice

Alpine

apk add ffmpeg

Debian

apt install ffmpeg

Fedora

yum install ffmpeg

Arch Linux

pacman -S ffmpeg python-sounddevice

Triggered events#

Actions#

Module reference#

class platypush.plugins.assistant.picovoice.AssistantPicovoicePlugin(*_, **__)[source]#

Bases: AssistantPlugin, RunnablePlugin

A voice assistant that runs on your device, based on the Picovoice engine.

Picovoice is a suite of on-device voice technologies that include:

  • Porcupine: wake-word engine, if you want the device to listen for a specific wake word in order to start the assistant.

  • Cheetah: speech-to-text engine, if you want your voice interactions to be transcribed into free text - either programmatically or when triggered by the wake word. Or:

  • Rhino: intent recognition engine, if you want to extract intents out of your voice commands - for instance, the phrase “set the living room temperature to 20 degrees” could be mapped to the intent with the following parameters: intent: set_temperature, room: living_room, temperature: 20.

  • Leopard: speech-to-text engine aimed at offline transcription of audio files rather than real-time transcription.

  • Orca: text-to-speech engine, if you want to create your custom logic to respond to user’s voice commands and render the responses as audio.

This plugin is a wrapper around the Picovoice engine that allows you to run your custom voice-based conversational flows on your device.

Getting a Picovoice account and access key#

You can get your personal access key by signing up at the Picovoice console. You may be asked to submit a reason for using the service (feel free to mention a personal Platypush integration), and you will receive your personal access key.

If prompted to select the products you want to use, make sure to select the ones from the Picovoice suite that you want to use with this plugin.

Hotword detection#

The hotword detection engine is based on Porcupine.

If enabled through the hotword_enabled parameter (default: True), the assistant will listen for a specific wake word before starting the speech-to-text or intent recognition engines. You can specify custom models for your hotword (e.g. on the same device you may use “Alexa” to trigger the speech-to-text engine in English, “Computer” to trigger the speech-to-text engine in Italian, and “Ok Google” to trigger the intent recognition engine.

You can also create your custom hotword models using the Porcupine console.

If hotword_enabled is set to True, you must also specify the keywords parameter with the list of keywords that you want to listen for, and optionally the keyword_paths parameter with the paths to the any custom hotword models that you want to use. If hotword_enabled is set to False, then the assistant won’t start listening for speech after the plugin is started, and you will need to programmatically start the conversation by calling the start_conversation() action, or trigger it from the UI.

When a wake-word is detected, the assistant will emit a platypush.message.event.assistant.HotwordDetectedEvent event that you can use to build your custom logic. For example:

import time

from platypush import when, run
from platypush.message.event.assistant import HotwordDetectedEvent

# Turn on a light for 5 seconds when the hotword "Alexa" is detected
@when(HotwordDetectedEvent, hotword='Alexa')
def on_hotword_detected(event: HotwordDetectedEvent, **context):
    run("light.hue.on", lights=["Living Room"])
    time.sleep(5)
    run("light.hue.off", lights=["Living Room"])

By default, the assistant will start listening for speech after the hotword if either stt_enabled or intent_model_path are set. If you don’t want the assistant to start listening for speech after the hotword is detected (for example because you want to build your custom response flows, or trigger the speech detection using different models depending on the hotword that is used, or because you just want to detect hotwords but not speech), then you can also set the start_conversation_on_hotword parameter to False. If that is the case, then you can programmatically start the conversation by calling the start_conversation() method in your event hooks:

from platypush import when, run
from platypush.message.event.assistant import HotwordDetectedEvent

# Start a conversation using the Italian language model when the
# "Buongiorno" hotword is detected
@when(HotwordDetectedEvent, hotword='Buongiorno')
def on_it_hotword_detected(event: HotwordDetectedEvent, **context):
    event.assistant.start_conversation(model_file='path/to/it.pv')

Speech-to-text#

The speech-to-text engine is based on Cheetah.

If enabled through the stt_enabled parameter (default: True), the assistant will transcribe the voice commands into text when a conversation is started either programmatically through start_conversation() or when the hotword is detected.

It will emit a platypush.message.event.assistant.SpeechRecognizedEvent when some speech is detected, and you can hook to that event to build your custom logic:

from platypush import when, run
from platypush.message.event.assistant import SpeechRecognizedEvent

# Turn on a light when the phrase "turn on the lights" is detected.
# Note that we can leverage regex-based pattern matching to be more
# flexible when matching the phrases. For example, the following hook
# will be matched when the user says "turn on the lights", "turn on
# lights", "lights on", "lights on please", "turn on light" etc.
@when(SpeechRecognizedEvent, phrase='turn on (the)? lights?')
def on_turn_on_lights(event: SpeechRecognizedEvent, **context):
    run("light.hue.on")

You can also leverage context extraction through the ${} syntax on the hook to extract specific tokens from the event that can be passed to your event hook. For example:

from platypush import when, run
from platypush.message.event.assistant import SpeechRecognizedEvent

@when(SpeechRecognizedEvent, phrase='play ${title} by ${artist}')
def on_play_track_command(
    event: SpeechRecognizedEvent, title: str, artist: str, **context
):
    results = run(
        "music.mopidy.search",
        filter={"title": title, "artist": artist}
    )

    if not results:
        event.assistant.render_response(f"Couldn't find {title} by {artist}")
        return

    run("music.mopidy.play", resource=results[0]["uri"])

Speech-to-intent#

The intent recognition engine is based on Rhino.

Intents are snippets of unstructured transcribed speech that can be matched to structured actions.

Unlike with hotword and speech-to-text detection, you need to provide a custom model for intent detection. You can create your custom model using the Rhino console.

When an intent is detected, the assistant will emit a platypush.message.event.assistant.IntentRecognizedEvent that can be listened.

For example, you can train a model to control groups of smart lights by defining the following slots on the Rhino console:

  • device_state: The new state of the device (e.g. with on or off as supported values)

  • room: The name of the room associated to the group of lights to be controlled (e.g. living room, kitchen, bedroom)

You can then define a lights_ctrl intent with the following expressions:

  • “turn $device_state:state the lights”

  • “turn $device_state:state the $room:room lights”

  • “turn the lights $device_state:state

  • “turn the $room:room lights $device_state:state

  • “turn $room:room lights $device_state:state

This intent will match any of the following phrases:

  • turn on the lights

  • turn off the lights

  • turn the lights on

  • turn the lights off

  • turn on the living room lights

  • turn off the living room lights

  • turn the living room lights on

  • turn the living room lights off

And it will extract any slots that are matched in the phrases in the platypush.message.event.assistant.IntentRecognizedEvent event.

Train the model, download the context file, and pass the path on the intent_model_path parameter.

You can then register a hook to listen to a specific intent:

from platypush import when, run
from platypush.message.event.assistant import IntentRecognizedEvent

@when(IntentRecognizedEvent, intent='lights_ctrl', slots={'state': 'on'})
def on_turn_on_lights(event: IntentRecognizedEvent, **context):
    room = event.slots.get('room')
    if room:
        run("light.hue.on", groups=[room])
    else:
        run("light.hue.on")

Note that if both stt_enabled and intent_model_path are set, then both the speech-to-text and intent recognition engines will run in parallel when a conversation is started.

The intent engine is usually faster, as it has a smaller set of intents to match and doesn’t have to run a full speech-to-text transcription. This means that, if an utterance matches both a speech-to-text phrase and an intent, the platypush.message.event.assistant.IntentRecognizedEvent event is emitted (and not platypush.message.event.assistant.SpeechRecognizedEvent).

This may not be always the case though. So it may be a good practice to also provide a fallback platypush.message.event.assistant.SpeechRecognizedEvent hook to catch the text if the speech is not recognized as an intent:

from platypush import when, run
from platypush.message.event.assistant import SpeechRecognizedEvent

@when(SpeechRecognizedEvent, phrase='turn ${state} (the)? ${room} lights?')
def on_turn_on_lights(event: SpeechRecognizedEvent, phrase, room, **context):
    if room:
        run("light.hue.on", groups=[room])
    else:
        run("light.hue.on")

Text-to-speech#

The text-to-speech engine is based on Orca.

It is not directly implemented by this plugin, but the implementation is provided in the platypush.plugins.tts.picovoice.TtsPicovoicePlugin plugin.

You can however leverage the render_response() action to render some text as speech in response to a user command, and that in turn will leverage the PicoVoice TTS plugin to render the response.

For example, the following snippet provides a hook that:

import re
from collections import defaultdict
from datetime import datetime as dt, timedelta
from dateutil.parser import isoparse
from logging import getLogger

from platypush import hook, run
from platypush.message.event.assistant import (
    SpeechRecognizedEvent,
    ResponseEndEvent,
)

logger = getLogger(__name__)

def play_music(*_, **__):
    run("music.mopidy.play")

def stop_music(*_, **__):
    run("music.mopidy.stop")

def ai_assist(event: SpeechRecognizedEvent, **__):
    response = run("openai.get_response", prompt=event.phrase)
    if not response:
        return

    run("assistant.picovoice.render_response", text=response)

# List of commands to match, as pairs of regex patterns and the
# corresponding actions
hooks = (
    (re.compile(r"play (the)?music", re.IGNORECASE), play_music),
    (re.compile(r"stop (the)?music", re.IGNORECASE), stop_music),
    # Fallback to the AI assistant
    (re.compile(r".*"), ai_assist),
)

@when(SpeechRecognizedEvent)
def on_speech_recognized(event, **kwargs):
    for pattern, command in hooks:
        if pattern.search(event.phrase):
            logger.info("Running voice command %s", command.__name__)
            command(event, **kwargs)
            break

@when(ResponseEndEvent)
def on_response_end(event: ResponseEndEvent, **__):
    # Check if the response is a question and start a follow-on turn if so.
    # Note that the ``openai`` plugin by default is configured to keep
    # the past interaction in a context window of ~10 minutes, so you
    # can follow up like in a real conversation.
    if event.assistant and event.response_text and event.response_text.endswith("?"):
        event.assistant.start_conversation()
__init__(access_key: str, hotword_enabled: bool = True, stt_enabled: bool = True, keywords: Sequence[str] | None = None, keyword_paths: Sequence[str] | None = None, keyword_model_path: str | None = None, speech_model_path: str | None = None, intent_model_path: str | None = None, endpoint_duration: float | None = 0.5, enable_automatic_punctuation: bool = False, start_conversation_on_hotword: bool = True, audio_queue_size: int = 100, conversation_timeout: float | None = 7.5, muted: bool = False, **kwargs)[source]#
Parameters:
  • access_key – Your Picovoice access key. You can get it by signing up at the Picovoice console <https://console.picovoice.ai/>.

  • hotword_enabled – Enable the wake-word engine (default: True). Note: The wake-word engine requires you to add Porcupine to the products available in your Picovoice account.

  • stt_enabled – Enable the speech-to-text engine (default: True). Note: The speech-to-text engine requires you to add Cheetah to the products available in your Picovoice account.

  • keywords – List of keywords to listen for (e.g. alexa, ok google…). This is required if the wake-word engine is enabled. See the Porcupine keywords repository). for a list of the stock keywords available. If you have a custom model, you can pass its path to the keyword_paths parameter and its filename (without the path and the platform extension) here.

  • keyword_paths

    List of paths to the keyword files to listen for. Custom keyword files can be created using the Porcupine console and downloaded from the console itself.

  • keyword_model_path – If you are using a keyword file in a non-English language, you can provide the path to the model file for its language. Model files are available for all the supported languages through the Porcupine lib repository.

  • speech_model_path – Path to the speech model file. If you are using a language other than English, you can provide the path to the model file for that language. Model files are available for all the supported languages through the Cheetah repository. You can also use the Speech console to train your custom models. You can use a base model and fine-tune it by boosting the detection of your own words and phrases and edit the phonetic representation of the words you want to detect.

  • intent_model_path

    Path to the Rhino context model. This is required if you want to use the intent recognition engine through Rhino. The context model is a file that contains a list of intents that can be recognized by the engine. An intent is an action or a class of actions that the assistant can recognize, and it can contain an optional number of slots to model context variables - e.g. temperature, lights group, location, device state etc. You can create your own context model using the Rhino console. For example, you can define a context file to control smart home devices by defining the following slots:

    • device_type: The device to control (e.g. lights, music)

    • device_state: The target state of the device (e.g. on, off)

    • location: The location of the device (e.g. living room, kitchen, bedroom)

    • media_type: The type of media to play (e.g. music, video)

    • media_state: The state of the media (e.g. play, pause, stop)

    You can then define the following intents:

    • device_ctrl: Control a device state. Supported phrases:
      • ”turn $device_state:state the $location:location $device_type:device

      • ”turn $device_state:state the $device_type:device

    • media_ctrl: Control media state. Supported phrases:
      • $media_state:state the $media_type:media

      • $media_state:state the $media_type:media in the $location:location

    Then a phrase like “turn on the lights in the living room” would trigger a platypush.message.event.assistant.IntentRecognizedEvent with:

    {
      "intent": "device_ctrl",
      "slots": {
        "type": "lights",
        "state": "on",
        "location": "living room"
      }
    }
    

    Note: The intent recognition engine requires you to add Rhino to the products available in your Picovoice account.

  • endpoint_duration – If set, the assistant will stop listening when no speech is detected for the specified duration (in seconds) after the end of an utterance.

  • enable_automatic_punctuation – Enable automatic punctuation insertion.

  • start_conversation_on_hotword – If set to True (default), a speech detection session will be started when the hotword is detected. If set to False, you may want to start the conversation programmatically by calling the start_conversation() method instead, or run any custom logic hotword detection logic. This can be particularly useful when you want to run the assistant in a push-to-talk mode, or when you want different hotwords to trigger conversations with different models or languages.

  • audio_queue_size – Maximum number of audio frames to hold in the processing queue. You may want to increase this value if you are running this integration on a slow device and/or the logs report audio frame drops too often. Keep in mind that increasing this value will increase the memory usage of the integration. Also, a higher value may result in higher accuracy at the cost of higher latency.

  • conversation_timeout – Maximum time to wait for some speech to be detected after the hotword is detected. If no speech is detected within this time, the conversation will time out and the plugin will go back into hotword detection mode, if the mode is enabled. Default: 7.5 seconds.

  • muted – Set to True to start the assistant in a muted state. You will need to call the unmute() method to start the assistant listening for commands, or programmatically call the start_conversation() to start a conversation.

is_detecting(*_, **__) bool#
Returns:

True if the asistant is detecting, False otherwise.

is_muted(*_, **__) bool#
Returns:

True if the microphone is muted, False otherwise.

main()[source]#

Implementation of the main loop of the plugin.

mute(*_, **__)[source]#

Mute the microphone. Alias for set_mic_mute() with muted=True.

pause_detection(*_, **__)#

Put the assistant on pause. No new conversation events will be triggered.

publish_entities(entities: Collection[Any] | None, callback: Callable[[Entity], Any] | None = None, **kwargs) Collection[Entity]#

Publishes a list of entities. The downstream consumers include:

It also accepts an optional callback that will be called when each of the entities in the set is flushed to the database.

You usually don’t need to override this class (but you may want to extend transform_entities() instead if your extension doesn’t natively handle Entity objects).

render_response(text: str, *_, with_follow_on_turn: bool | None = None, **__) bool#

Render a response text as audio over the configured TTS plugin.

Parameters:
  • text – Text to render.

  • with_follow_on_turn – If set, the assistant will wait for a follow-up. By default, with_follow_on_turn will be automatically set to true if the text ends with a question mark.

Returns:

True if the assistant is waiting for a follow-up, False otherwise.

resume_detection(*_, **__)#

Resume the assistant hotword detection from a paused state.

say(text: str, *args, **kwargs)[source]#

Proxy to platypush.plugins.tts.picovoice.TtsPicovoicePlugin.say to render some text as speech through the Picovoice TTS engine.

Extra arguments to platypush.plugins.tts.picovoice.TtsPicovoicePlugin.say can be passed over args and kwargs.

Parameters:

text – Text to be rendered as speech.

send_text_query(*_, query: str, **__)[source]#

Send a text query to the assistant.

This is equivalent to saying something to the assistant.

Parameters:

query – Query to be sent.

set_mic_mute(muted: bool)[source]#

Programmatically mute/unmute the microphone.

Parameters:

muted – Set to True or False.

start()#

Start the plugin.

start_conversation(*_, model_file: str | None = None, **__)[source]#

Programmatically start a conversation with the assistant.

Parameters:

model_file – Override the model file to be used to detect speech in this conversation. If not set, the configured speech_model_path will be used.

status(*_, **__)#
Returns:

The current assistant status:

{
    "last_query": "What time is it?",
    "last_response": "It's 10:30 AM",
    "conversation_running": true,
    "is_muted": false,
    "is_detecting": true
}

stop()[source]#

Stop the plugin.

stop_conversation(*_, **__)#

Programmatically stops a conversation.

toggle_mute(*_, **__)[source]#

Toggle the mic mute state.

transcribe(audio_file: str, *_, model_file: str | None = None, **__)[source]#

Transcribe an audio file to text using the Leopard engine.

Parameters:
  • text – Text to be transcribed.

  • model_file – Override the model file to be used to detect speech in this conversation. If not set, the configured speech_model_path will be used.

Returns:

dict

{
  "transcription": "This is a test",
  "words": [
    {
      "word": "this",
      "start": 0.06400000303983688,
      "end": 0.19200000166893005,
      "confidence": 0.9626294374465942
    },
    {
      "word": "is",
      "start": 0.2879999876022339,
      "end": 0.35199999809265137,
      "confidence": 0.9781675934791565
    },
    {
      "word": "a",
      "start": 0.41600000858306885,
      "end": 0.41600000858306885,
      "confidence": 0.9764975309371948
    },
    {
      "word": "test",
      "start": 0.5120000243186951,
      "end": 0.8320000171661377,
      "confidence": 0.9511580467224121
    }
  ]
}

transform_entities(entities: Collection[AssistantPlugin], **_)#

This method takes a list of entities in any (plugin-specific) format and converts them into a standardized collection of Entity objects. Since this method is called by publish_entities() before entity updates are published, you may usually want to extend it to pre-process the entities managed by your extension into the standard format before they are stored and published to all the consumers.

unmute(*_, **__)[source]#

Unmute the microphone. Alias for set_mic_mute() with muted=False.

wait_stop(timeout=None)#

Wait until a stop event is received.