assistant.openai#

Description#

A voice assistant based on the OpenAI API.

It requires the platypush.plugins.openai.OpenaiPlugin plugin to be configured with an OpenAI API key.

Hotword detection#

This plugin doesn’t have hotword detection, as OpenAI doesn’t provide an API for that. Instead, the assistant can be started and stopped programmatically through the start_conversation() action.

If you want to implement hotword detection, you can use a separate plugin such as platypush.plugins.assistant.picovoice.AssistantPicovoicePlugin.

The configuration in this case would be like:

assistant.picovoice:
  access_key: YOUR_PICOVOICE_ACCESS_KEY

  # List of hotwords to listen for
  keywords:
    - alexa
    - computer
    - ok google

  # Disable speech-to-text and intent recognition, only use hotword
  # detection
  stt_enabled: false
  hotword_enabled: true

  conversation_start_sound: /sound/to/play/when/the/conversation/starts.mp3
  # speech_model_path: /mnt/hd/models/picovoice/cheetah/custom-en.pv
  # intent_model_path: /mnt/hd/models/picovoice/rhino/custom-en-x86.rhn

openai:
  api_key: YOUR_OPENAI_API_KEY

  # Customize your assistant's context and knowledge base to your
  # liking
  context:
    - role: system
      content: >
        You are a 16th century noble lady who talks in
        Shakespearean English to her peers.

# Enable the assistant plugin
assistant.openai:

# Enable the text-to-speech plugin
tts.openai:
  # Customize the voice model
  voice: nova

Then you can call start_conversation() when the hotword is detected platypush.message.event.assistant.HotwordDetectedEvent is triggered:

from platypush import run, when
from platypush.message.event.assistant import HotwordDetectedEvent

@when(HotwordDetectedEvent)
# You can also customize it by running a different assistant logic
# depending on the hotword
# @when(HotwordDetectedEvent, hotword='computer')
def on_hotword_detected():
    run("assistant.openai.start_conversation")

This configuration will:

  1. Start the hotword detection when the application starts.

  2. Start the OpenAI assistant when the hotword is detected.

AI responses#

By default (unless you set stop_conversation_on_speech_match to False), the plugin will:

  1. Process the speech through the OpenAI API (the GPT model to be is configurable in the OpenAI plugin model configuration).

  2. Render the response through the configured tts_plugin (default: tts.openai). If tts_plugin is not set, then the response will be returned as a string.

Custom speech processing#

You can create custom hooks on platypush.message.event.assistant.SpeechRecognizedEvent with custom phrase strings or (regex) patterns. For example:

from platypush import run, when
from platypush.message.event.assistant import SpeechRecognizedEvent

# Matches any phrase that contains either "play music" or "play the
# music"
@when(SpeechRecognizedEvent, phrase='play (the)? music')
def play_music():
    run('music.mpd.play')

If at least a custom hook with a non-empty phrase string is matched, then the default response will be disabled. If you still want the assistant to say something when the event is handled, you can call event.assistant.render_response on the hook:

from datetime import datetime
from textwrap import dedent
from time import time

from platypush import run, when
from platypush.message.event.assistant import SpeechRecognizedEvent

@when(SpeechRecognizedEvent, phrase='weather today')
def weather_forecast(event: SpeechRecognizedEvent):
    limit = time() + 24 * 60 * 60  # 24 hours from now
    forecast = [
        weather
        for weather in run("weather.openweathermap.get_forecast")
        if datetime.fromisoformat(weather["time"]).timestamp() < limit
    ]

    min_temp = round(
        min(weather["temperature"] for weather in forecast)
    )
    max_temp = round(
        max(weather["temperature"] for weather in forecast)
    )
    max_wind_gust = round(
        (max(weather["wind_gust"] for weather in forecast)) * 3.6
    )
    summaries = [weather["summary"] for weather in forecast]
    most_common_summary = max(summaries, key=summaries.count)
    avg_cloud_cover = round(
        sum(weather["cloud_cover"] for weather in forecast) / len(forecast)
    )

    event.assistant.render_response(
        dedent(
            f"""
            The forecast for today is: {most_common_summary}, with
            a minimum of {min_temp} and a maximum of {max_temp}
            degrees, wind gust of {max_wind_gust} km/h, and an
            average cloud cover of {avg_cloud_cover}%.
            """
        )
    )

Conversation follow-up#

A conversation will have a follow-up (i.e. the assistant will listen for a phrase after rendering a response) if the response is not empty and ends with a question mark. If you want to force a follow-up even if the response doesn’t end with a question mark, you can call start_conversation() programmatically from your hooks.

Configuration#

assistant.openai:
  # [Optional]
  # OpenAI model to use for audio transcription (default:
  # ``whisper-1``).
  # model: whisper-1  # type=str

  # [Optional]
  # Name of the TTS plugin to use for rendering the responses
  # (default: ``tts.openai``).
  # tts_plugin: tts.openai  # type=Optional[str]

  # [Optional]
  # Minimum silence duration in seconds to detect
  # the end of a conversation (default: 1.0 seconds).
  # min_silence_secs: 1.0  # type=float

  # [Optional]
  # Silence threshold in dBFS (default: -22).
  # The value of 0 is the maximum amplitude, and -120 is associated to
  # a silent or nearly silent audio, thus the higher the value, the more
  # sensitive the silence detection will be (default: -22).
  # silence_threshold: -22  # type=int

  # [Optional]
  # Recording sample rate in Hz (default: 16000).
  # sample_rate: 16000  # type=int

  # [Optional]
  # Recording frame size in samples (default: 16384).
  # Note that it's important to make sure that ``frame_size`` /
  # ``sample_rate`` isn't smaller than the minimum silence duration,
  # otherwise the silence detection won't work properly.
  # frame_size: 16384  # type=int

  # [Optional]
  # Number of recording channels (default: 1).
  # channels: 1  # type=int

  # [Optional]
  # How long to wait for the
  # conversation to start (i.e. the first non-silent audio frame to be
  # detected) before giving up and stopping the recording (default: 5.0
  # seconds).
  # conversation_start_timeout: 5.0  # type=float

  # [Optional]
  # How many seconds of silence to wait
  # after the last non-silent audio frame before stopping the recording
  # (default: 1.5 seconds).
  # conversation_end_timeout: 1.0  # type=float

  # [Optional]
  # Maximum conversation duration in seconds
  # (default: 15.0 seconds).
  # conversation_max_duration: 15.0  # type=float

  # [Optional]
  # Optional arguments to be passed to the TTS
  # ``say`` action, if ``tts_plugin`` is set.
  # tts_plugin_args:   # type=Optional[Dict[str, Any]]

  # [Optional]
  # If set, the assistant will play this
  # audio file when it detects a speech. The sound file will be played
  # on the default audio output device. If not set, the assistant won't
  # play any sound when it detects a speech.
  # conversation_start_sound:   # type=Optional[str]

  # [Optional]
  # If set, the plugin will
  # prevent the default assistant response when a
  # `SpeechRecognizedEvent <https://docs.platypush.tech/platypush/events/assistant.html#platypush.message.event.assistant.SpeechRecognizedEvent>`_
  # matches a user hook with a condition on a ``phrase`` field. This is
  # useful to prevent the assistant from responding with a default "*I'm
  # sorry, I can't help you with that*" when e.g. you say "*play the
  # music*", and you have a hook that matches the phrase "*play the
  # music*" and handles it with a custom action. If set, and you wish
  # the assistant to also provide an answer if an event matches one of
  # your hooks, then you should call the ``render_response`` method
  # in your hook handler. If not set, then the assistant will always try
  # and respond with a default message, even if a speech event matches
  # the phrase of one of your hooks. In this case, if you want to prevent
  # the default response, you should call ``stop_conversation``
  # explicitly from your hook handler. Default: True.
  # stop_conversation_on_speech_match: True  # type=bool

  # [Optional]
  # How often the `RunnablePlugin.loop <https://docs.platypush.tech/platypush/plugins/.html#platypush.plugins.RunnablePlugin.loop>`_ function should be
  # executed (default: 15 seconds). *NOTE*: For back-compatibility
  # reasons, the `poll_seconds` argument is also supported, but it's
  # deprecated.
  # poll_interval: 15  # type=Optional[float]

  # [Optional]
  # How long we should wait for any running
  # threads/processes to stop before exiting (default: 5 seconds).
  # stop_timeout: 5  # type=Optional[float]

  # [Optional]
  # If set to True then the plugin will not monitor
  # for new events. This is useful if you want to run a plugin in
  # stateless mode and only leverage its actions, without triggering any
  # events. Defaults to False.
  # disable_monitor: False  # type=bool

Dependencies#

pip

pip install sounddevice pydub numpy

Alpine

apk add ffmpeg py3-numpy

Debian

apt install ffmpeg python3-numpy python3-pydub

Fedora

yum install ffmpeg python-numpy

Arch Linux

pacman -S ffmpeg python-numpy python-sounddevice

Triggered events#

Actions#

Module reference#

class platypush.plugins.assistant.openai.AssistantOpenaiPlugin(*_, **__)[source]#

Bases: AssistantPlugin, RunnablePlugin

A voice assistant based on the OpenAI API.

It requires the platypush.plugins.openai.OpenaiPlugin plugin to be configured with an OpenAI API key.

Hotword detection#

This plugin doesn’t have hotword detection, as OpenAI doesn’t provide an API for that. Instead, the assistant can be started and stopped programmatically through the start_conversation() action.

If you want to implement hotword detection, you can use a separate plugin such as platypush.plugins.assistant.picovoice.AssistantPicovoicePlugin.

The configuration in this case would be like:

assistant.picovoice:
  access_key: YOUR_PICOVOICE_ACCESS_KEY

  # List of hotwords to listen for
  keywords:
    - alexa
    - computer
    - ok google

  # Disable speech-to-text and intent recognition, only use hotword
  # detection
  stt_enabled: false
  hotword_enabled: true

  conversation_start_sound: /sound/to/play/when/the/conversation/starts.mp3
  # speech_model_path: /mnt/hd/models/picovoice/cheetah/custom-en.pv
  # intent_model_path: /mnt/hd/models/picovoice/rhino/custom-en-x86.rhn

openai:
  api_key: YOUR_OPENAI_API_KEY

  # Customize your assistant's context and knowledge base to your
  # liking
  context:
    - role: system
      content: >
        You are a 16th century noble lady who talks in
        Shakespearean English to her peers.

# Enable the assistant plugin
assistant.openai:

# Enable the text-to-speech plugin
tts.openai:
  # Customize the voice model
  voice: nova

Then you can call start_conversation() when the hotword is detected platypush.message.event.assistant.HotwordDetectedEvent is triggered:

from platypush import run, when
from platypush.message.event.assistant import HotwordDetectedEvent

@when(HotwordDetectedEvent)
# You can also customize it by running a different assistant logic
# depending on the hotword
# @when(HotwordDetectedEvent, hotword='computer')
def on_hotword_detected():
    run("assistant.openai.start_conversation")

This configuration will:

  1. Start the hotword detection when the application starts.

  2. Start the OpenAI assistant when the hotword is detected.

AI responses#

By default (unless you set stop_conversation_on_speech_match to False), the plugin will:

  1. Process the speech through the OpenAI API (the GPT model to be is configurable in the OpenAI plugin model configuration).

  2. Render the response through the configured tts_plugin (default: tts.openai). If tts_plugin is not set, then the response will be returned as a string.

Custom speech processing#

You can create custom hooks on platypush.message.event.assistant.SpeechRecognizedEvent with custom phrase strings or (regex) patterns. For example:

from platypush import run, when
from platypush.message.event.assistant import SpeechRecognizedEvent

# Matches any phrase that contains either "play music" or "play the
# music"
@when(SpeechRecognizedEvent, phrase='play (the)? music')
def play_music():
    run('music.mpd.play')

If at least a custom hook with a non-empty phrase string is matched, then the default response will be disabled. If you still want the assistant to say something when the event is handled, you can call event.assistant.render_response on the hook:

from datetime import datetime
from textwrap import dedent
from time import time

from platypush import run, when
from platypush.message.event.assistant import SpeechRecognizedEvent

@when(SpeechRecognizedEvent, phrase='weather today')
def weather_forecast(event: SpeechRecognizedEvent):
    limit = time() + 24 * 60 * 60  # 24 hours from now
    forecast = [
        weather
        for weather in run("weather.openweathermap.get_forecast")
        if datetime.fromisoformat(weather["time"]).timestamp() < limit
    ]

    min_temp = round(
        min(weather["temperature"] for weather in forecast)
    )
    max_temp = round(
        max(weather["temperature"] for weather in forecast)
    )
    max_wind_gust = round(
        (max(weather["wind_gust"] for weather in forecast)) * 3.6
    )
    summaries = [weather["summary"] for weather in forecast]
    most_common_summary = max(summaries, key=summaries.count)
    avg_cloud_cover = round(
        sum(weather["cloud_cover"] for weather in forecast) / len(forecast)
    )

    event.assistant.render_response(
        dedent(
            f"""
            The forecast for today is: {most_common_summary}, with
            a minimum of {min_temp} and a maximum of {max_temp}
            degrees, wind gust of {max_wind_gust} km/h, and an
            average cloud cover of {avg_cloud_cover}%.
            """
        )
    )

Conversation follow-up#

A conversation will have a follow-up (i.e. the assistant will listen for a phrase after rendering a response) if the response is not empty and ends with a question mark. If you want to force a follow-up even if the response doesn’t end with a question mark, you can call start_conversation() programmatically from your hooks.

__init__(model: str = 'whisper-1', tts_plugin: str | None = 'tts.openai', min_silence_secs: float = 1.0, silence_threshold: int = -22, sample_rate: int = 16000, frame_size: int = 16384, channels: int = 1, conversation_start_timeout: float = 5.0, conversation_end_timeout: float = 1.0, conversation_max_duration: float = 15.0, **kwargs)[source]#
Parameters:
  • model – OpenAI model to use for audio transcription (default: whisper-1).

  • tts_plugin – Name of the TTS plugin to use for rendering the responses (default: tts.openai).

  • min_silence_secs – Minimum silence duration in seconds to detect the end of a conversation (default: 1.0 seconds).

  • silence_threshold – Silence threshold in dBFS (default: -22). The value of 0 is the maximum amplitude, and -120 is associated to a silent or nearly silent audio, thus the higher the value, the more sensitive the silence detection will be (default: -22).

  • sample_rate – Recording sample rate in Hz (default: 16000).

  • frame_size – Recording frame size in samples (default: 16384). Note that it’s important to make sure that frame_size / sample_rate isn’t smaller than the minimum silence duration, otherwise the silence detection won’t work properly.

  • channels – Number of recording channels (default: 1).

  • conversation_start_timeout – How long to wait for the conversation to start (i.e. the first non-silent audio frame to be detected) before giving up and stopping the recording (default: 5.0 seconds).

  • conversation_end_timeout – How many seconds of silence to wait after the last non-silent audio frame before stopping the recording (default: 1.5 seconds).

  • conversation_max_duration – Maximum conversation duration in seconds (default: 15.0 seconds).

is_detecting(*_, **__) bool#
Returns:

True if the asistant is detecting, False otherwise.

is_muted(*_, **__) bool#
Returns:

True if the microphone is muted, False otherwise.

main()[source]#

Implementation of the main loop of the plugin.

mute(*_, **__)[source]#

Note

This plugin has no hotword detection, thus no continuous audio detection. Speech processing is done on-demand through the start_conversation() and stop_conversation() methods. Therefore, the mute() and unmute() methods are not implemented.

pause_detection(*_, **__)#

Put the assistant on pause. No new conversation events will be triggered.

publish_entities(entities: Collection[Any] | None, callback: Callable[[Entity], Any] | None = None, **kwargs) Collection[Entity]#

Publishes a list of entities. The downstream consumers include:

It also accepts an optional callback that will be called when each of the entities in the set is flushed to the database.

You usually don’t need to override this class (but you may want to extend transform_entities() instead if your extension doesn’t natively handle Entity objects).

render_response(text: str, *_, with_follow_on_turn: bool | None = None, **__) bool#

Render a response text as audio over the configured TTS plugin.

Parameters:
  • text – Text to render.

  • with_follow_on_turn – If set, the assistant will wait for a follow-up. By default, with_follow_on_turn will be automatically set to true if the text ends with a question mark.

Returns:

True if the assistant is waiting for a follow-up, False otherwise.

resume_detection(*_, **__)#

Resume the assistant hotword detection from a paused state.

send_text_query(text: str, *_, **__)[source]#

If the tts_plugin configuration is set, then the assistant will process the given text query through platypush.plugins.openai.OpenaiPlugin.get_response() and render the response through the specified TTS plugin.

Returns:

The response received from platypush.plugins.openai.OpenaiPlugin.get_response().

start()#

Start the plugin.

start_conversation(*_, **__)[source]#

Start a conversation with the assistant. The conversation will be automatically stopped after conversation_max_duration seconds of audio, or after conversation_start_timeout seconds of silence with no audio detected, or after conversation_end_timeout seconds after the last non-silent audio frame has been detected, or when the stop_conversation() method is called.

status(*_, **__)#
Returns:

The current assistant status:

{
    "last_query": "What time is it?",
    "last_response": "It's 10:30 AM",
    "conversation_running": true,
    "is_muted": false,
    "is_detecting": true
}

stop()[source]#

Stop the plugin.

stop_conversation(*_, **__)#

Programmatically stops a conversation.

toggle_mute(*_, **__)#

Toggle the mute state of the microphone.

transform_entities(entities: Collection[AssistantPlugin], **_)#

This method takes a list of entities in any (plugin-specific) format and converts them into a standardized collection of Entity objects. Since this method is called by publish_entities() before entity updates are published, you may usually want to extend it to pre-process the entities managed by your extension into the standard format before they are stored and published to all the consumers.

unmute(*_, **__)[source]#

Note

This plugin has no hotword detection, thus no continuous audio detection. Speech processing is done on-demand through the start_conversation() and stop_conversation() methods. Therefore, the mute() and unmute() methods are not implemented.

wait_stop(timeout=None)#

Wait until a stop event is received.