assistant.openai
#
Description#
A voice assistant based on the OpenAI API.
It requires the platypush.plugins.openai.OpenaiPlugin
plugin to be
configured with an OpenAI API key.
Hotword detection#
This plugin doesn’t have hotword detection, as OpenAI doesn’t provide
an API for that. Instead, the assistant can be started and stopped
programmatically through the start_conversation()
action.
If you want to implement hotword detection, you can use a separate plugin
such as
platypush.plugins.assistant.picovoice.AssistantPicovoicePlugin
.
The configuration in this case would be like:
assistant.picovoice: access_key: YOUR_PICOVOICE_ACCESS_KEY # List of hotwords to listen for keywords: - alexa - computer - ok google # Disable speech-to-text and intent recognition, only use hotword # detection stt_enabled: false hotword_enabled: true conversation_start_sound: /sound/to/play/when/the/conversation/starts.mp3 # speech_model_path: /mnt/hd/models/picovoice/cheetah/custom-en.pv # intent_model_path: /mnt/hd/models/picovoice/rhino/custom-en-x86.rhn openai: api_key: YOUR_OPENAI_API_KEY # Customize your assistant's context and knowledge base to your # liking context: - role: system content: > You are a 16th century noble lady who talks in Shakespearean English to her peers. # Enable the assistant plugin assistant.openai: # Enable the text-to-speech plugin tts.openai: # Customize the voice model voice: nova
Then you can call start_conversation()
when the hotword is detected
platypush.message.event.assistant.HotwordDetectedEvent
is
triggered:
from platypush import run, when from platypush.message.event.assistant import HotwordDetectedEvent @when(HotwordDetectedEvent) # You can also customize it by running a different assistant logic # depending on the hotword # @when(HotwordDetectedEvent, hotword='computer') def on_hotword_detected(): run("assistant.openai.start_conversation")
This configuration will:
Start the hotword detection when the application starts.
Start the OpenAI assistant when the hotword is detected.
AI responses#
By default (unless you set stop_conversation_on_speech_match
to False
),
the plugin will:
Process the speech through the OpenAI API (the GPT model to be is configurable in the OpenAI plugin
model
configuration).Render the response through the configured
tts_plugin
(default:tts.openai
). Iftts_plugin
is not set, then the response will be returned as a string.
Custom speech processing#
You can create custom hooks on
platypush.message.event.assistant.SpeechRecognizedEvent
with
custom phrase
strings or (regex) patterns. For example:
from platypush import run, when from platypush.message.event.assistant import SpeechRecognizedEvent # Matches any phrase that contains either "play music" or "play the # music" @when(SpeechRecognizedEvent, phrase='play (the)? music') def play_music(): run('music.mpd.play')
If at least a custom hook with a non-empty phrase
string is matched,
then the default response will be disabled. If you still want the assistant
to say something when the event is handled, you can call
event.assistant.render_response
on the hook:
from datetime import datetime from textwrap import dedent from time import time from platypush import run, when from platypush.message.event.assistant import SpeechRecognizedEvent @when(SpeechRecognizedEvent, phrase='weather today') def weather_forecast(event: SpeechRecognizedEvent): limit = time() + 24 * 60 * 60 # 24 hours from now forecast = [ weather for weather in run("weather.openweathermap.get_forecast") if datetime.fromisoformat(weather["time"]).timestamp() < limit ] min_temp = round( min(weather["temperature"] for weather in forecast) ) max_temp = round( max(weather["temperature"] for weather in forecast) ) max_wind_gust = round( (max(weather["wind_gust"] for weather in forecast)) * 3.6 ) summaries = [weather["summary"] for weather in forecast] most_common_summary = max(summaries, key=summaries.count) avg_cloud_cover = round( sum(weather["cloud_cover"] for weather in forecast) / len(forecast) ) event.assistant.render_response( dedent( f""" The forecast for today is: {most_common_summary}, with a minimum of {min_temp} and a maximum of {max_temp} degrees, wind gust of {max_wind_gust} km/h, and an average cloud cover of {avg_cloud_cover}%. """ ) )
Conversation follow-up#
A conversation will have a follow-up (i.e. the assistant will listen for a
phrase after rendering a response) if the response is not empty and ends
with a question mark. If you want to force a follow-up even if the response
doesn’t end with a question mark, you can call start_conversation()
programmatically from your hooks.
Configuration#
assistant.openai:
# [Optional]
# OpenAI model to use for audio transcription (default:
# ``whisper-1``).
# model: whisper-1 # type=str
# [Optional]
# Name of the TTS plugin to use for rendering the responses
# (default: ``tts.openai``).
# tts_plugin: tts.openai # type=Optional[str]
# [Optional]
# Minimum silence duration in seconds to detect
# the end of a conversation (default: 1.0 seconds).
# min_silence_secs: 1.0 # type=float
# [Optional]
# Silence threshold in dBFS (default: -22).
# The value of 0 is the maximum amplitude, and -120 is associated to
# a silent or nearly silent audio, thus the higher the value, the more
# sensitive the silence detection will be (default: -22).
# silence_threshold: -22 # type=int
# [Optional]
# Recording sample rate in Hz (default: 16000).
# sample_rate: 16000 # type=int
# [Optional]
# Recording frame size in samples (default: 16384).
# Note that it's important to make sure that ``frame_size`` /
# ``sample_rate`` isn't smaller than the minimum silence duration,
# otherwise the silence detection won't work properly.
# frame_size: 16384 # type=int
# [Optional]
# Number of recording channels (default: 1).
# channels: 1 # type=int
# [Optional]
# How long to wait for the
# conversation to start (i.e. the first non-silent audio frame to be
# detected) before giving up and stopping the recording (default: 5.0
# seconds).
# conversation_start_timeout: 5.0 # type=float
# [Optional]
# How many seconds of silence to wait
# after the last non-silent audio frame before stopping the recording
# (default: 1.5 seconds).
# conversation_end_timeout: 1.0 # type=float
# [Optional]
# Maximum conversation duration in seconds
# (default: 15.0 seconds).
# conversation_max_duration: 15.0 # type=float
# [Optional]
# Optional arguments to be passed to the TTS
# ``say`` action, if ``tts_plugin`` is set.
# tts_plugin_args: # type=Optional[Dict[str, Any]]
# [Optional]
# If set, the assistant will play this
# audio file when it detects a speech. The sound file will be played
# on the default audio output device. If not set, the assistant won't
# play any sound when it detects a speech.
# conversation_start_sound: # type=Optional[str]
# [Optional]
# If set, the plugin will
# prevent the default assistant response when a
# `SpeechRecognizedEvent <https://docs.platypush.tech/platypush/events/assistant.html#platypush.message.event.assistant.SpeechRecognizedEvent>`_
# matches a user hook with a condition on a ``phrase`` field. This is
# useful to prevent the assistant from responding with a default "*I'm
# sorry, I can't help you with that*" when e.g. you say "*play the
# music*", and you have a hook that matches the phrase "*play the
# music*" and handles it with a custom action. If set, and you wish
# the assistant to also provide an answer if an event matches one of
# your hooks, then you should call the ``render_response`` method
# in your hook handler. If not set, then the assistant will always try
# and respond with a default message, even if a speech event matches
# the phrase of one of your hooks. In this case, if you want to prevent
# the default response, you should call ``stop_conversation``
# explicitly from your hook handler. Default: True.
# stop_conversation_on_speech_match: True # type=bool
# [Optional]
# How often the `RunnablePlugin.loop <https://docs.platypush.tech/platypush/plugins/.html#platypush.plugins.RunnablePlugin.loop>`_ function should be
# executed (default: 15 seconds). *NOTE*: For back-compatibility
# reasons, the `poll_seconds` argument is also supported, but it's
# deprecated.
# poll_interval: 15 # type=Optional[float]
# [Optional]
# How long we should wait for any running
# threads/processes to stop before exiting (default: 5 seconds).
# stop_timeout: 5 # type=Optional[float]
# [Optional]
# If set to True then the plugin will not monitor
# for new events. This is useful if you want to run a plugin in
# stateless mode and only leverage its actions, without triggering any
# events. Defaults to False.
# disable_monitor: False # type=bool
Dependencies#
pip
pip install numpy pydub sounddevice
Alpine
apk add ffmpeg py3-numpy
Debian
apt install ffmpeg python3-numpy python3-pydub
Fedora
yum install ffmpeg python-numpy
Arch Linux
pacman -S ffmpeg python-sounddevice python-numpy
Triggered events#
Actions#
Module reference#
- class platypush.plugins.assistant.openai.AssistantOpenaiPlugin(*_, **__)[source]#
Bases:
AssistantPlugin
,RunnablePlugin
A voice assistant based on the OpenAI API.
It requires the
platypush.plugins.openai.OpenaiPlugin
plugin to be configured with an OpenAI API key.Hotword detection#
This plugin doesn’t have hotword detection, as OpenAI doesn’t provide an API for that. Instead, the assistant can be started and stopped programmatically through the
start_conversation()
action.If you want to implement hotword detection, you can use a separate plugin such as
platypush.plugins.assistant.picovoice.AssistantPicovoicePlugin
.The configuration in this case would be like:
assistant.picovoice: access_key: YOUR_PICOVOICE_ACCESS_KEY # List of hotwords to listen for keywords: - alexa - computer - ok google # Disable speech-to-text and intent recognition, only use hotword # detection stt_enabled: false hotword_enabled: true conversation_start_sound: /sound/to/play/when/the/conversation/starts.mp3 # speech_model_path: /mnt/hd/models/picovoice/cheetah/custom-en.pv # intent_model_path: /mnt/hd/models/picovoice/rhino/custom-en-x86.rhn openai: api_key: YOUR_OPENAI_API_KEY # Customize your assistant's context and knowledge base to your # liking context: - role: system content: > You are a 16th century noble lady who talks in Shakespearean English to her peers. # Enable the assistant plugin assistant.openai: # Enable the text-to-speech plugin tts.openai: # Customize the voice model voice: nova
Then you can call
start_conversation()
when the hotword is detectedplatypush.message.event.assistant.HotwordDetectedEvent
is triggered:from platypush import run, when from platypush.message.event.assistant import HotwordDetectedEvent @when(HotwordDetectedEvent) # You can also customize it by running a different assistant logic # depending on the hotword # @when(HotwordDetectedEvent, hotword='computer') def on_hotword_detected(): run("assistant.openai.start_conversation")
This configuration will:
Start the hotword detection when the application starts.
Start the OpenAI assistant when the hotword is detected.
AI responses#
By default (unless you set
stop_conversation_on_speech_match
toFalse
), the plugin will:Process the speech through the OpenAI API (the GPT model to be is configurable in the OpenAI plugin
model
configuration).Render the response through the configured
tts_plugin
(default:tts.openai
). Iftts_plugin
is not set, then the response will be returned as a string.
Custom speech processing#
You can create custom hooks on
platypush.message.event.assistant.SpeechRecognizedEvent
with customphrase
strings or (regex) patterns. For example:from platypush import run, when from platypush.message.event.assistant import SpeechRecognizedEvent # Matches any phrase that contains either "play music" or "play the # music" @when(SpeechRecognizedEvent, phrase='play (the)? music') def play_music(): run('music.mpd.play')
If at least a custom hook with a non-empty
phrase
string is matched, then the default response will be disabled. If you still want the assistant to say something when the event is handled, you can callevent.assistant.render_response
on the hook:from datetime import datetime from textwrap import dedent from time import time from platypush import run, when from platypush.message.event.assistant import SpeechRecognizedEvent @when(SpeechRecognizedEvent, phrase='weather today') def weather_forecast(event: SpeechRecognizedEvent): limit = time() + 24 * 60 * 60 # 24 hours from now forecast = [ weather for weather in run("weather.openweathermap.get_forecast") if datetime.fromisoformat(weather["time"]).timestamp() < limit ] min_temp = round( min(weather["temperature"] for weather in forecast) ) max_temp = round( max(weather["temperature"] for weather in forecast) ) max_wind_gust = round( (max(weather["wind_gust"] for weather in forecast)) * 3.6 ) summaries = [weather["summary"] for weather in forecast] most_common_summary = max(summaries, key=summaries.count) avg_cloud_cover = round( sum(weather["cloud_cover"] for weather in forecast) / len(forecast) ) event.assistant.render_response( dedent( f""" The forecast for today is: {most_common_summary}, with a minimum of {min_temp} and a maximum of {max_temp} degrees, wind gust of {max_wind_gust} km/h, and an average cloud cover of {avg_cloud_cover}%. """ ) )
Conversation follow-up#
A conversation will have a follow-up (i.e. the assistant will listen for a phrase after rendering a response) if the response is not empty and ends with a question mark. If you want to force a follow-up even if the response doesn’t end with a question mark, you can call
start_conversation()
programmatically from your hooks.- __init__(model: str = 'whisper-1', tts_plugin: str | None = 'tts.openai', min_silence_secs: float = 1.0, silence_threshold: int = -22, sample_rate: int = 16000, frame_size: int = 16384, channels: int = 1, conversation_start_timeout: float = 5.0, conversation_end_timeout: float = 1.0, conversation_max_duration: float = 15.0, **kwargs)[source]#
- Parameters:
model – OpenAI model to use for audio transcription (default:
whisper-1
).tts_plugin – Name of the TTS plugin to use for rendering the responses (default:
tts.openai
).min_silence_secs – Minimum silence duration in seconds to detect the end of a conversation (default: 1.0 seconds).
silence_threshold – Silence threshold in dBFS (default: -22). The value of 0 is the maximum amplitude, and -120 is associated to a silent or nearly silent audio, thus the higher the value, the more sensitive the silence detection will be (default: -22).
sample_rate – Recording sample rate in Hz (default: 16000).
frame_size – Recording frame size in samples (default: 16384). Note that it’s important to make sure that
frame_size
/sample_rate
isn’t smaller than the minimum silence duration, otherwise the silence detection won’t work properly.channels – Number of recording channels (default: 1).
conversation_start_timeout – How long to wait for the conversation to start (i.e. the first non-silent audio frame to be detected) before giving up and stopping the recording (default: 5.0 seconds).
conversation_end_timeout – How many seconds of silence to wait after the last non-silent audio frame before stopping the recording (default: 1.5 seconds).
conversation_max_duration – Maximum conversation duration in seconds (default: 15.0 seconds).
- mute(*_, **__)[source]#
Note
This plugin has no hotword detection, thus no continuous audio detection. Speech processing is done on-demand through the
start_conversation()
andstop_conversation()
methods. Therefore, themute()
andunmute()
methods are not implemented.
- pause_detection(*_, **__)#
Put the assistant on pause. No new conversation events will be triggered.
- publish_entities(entities: Collection[Any] | None, callback: Callable[[Entity], Any] | None = None, **kwargs) Collection[Entity] #
Publishes a list of entities. The downstream consumers include:
The entity persistence manager
The web server
- Any consumer subscribed to
platypush.message.event.entities.EntityUpdateEvent
events (e.g. web clients)
It also accepts an optional callback that will be called when each of the entities in the set is flushed to the database.
You usually don’t need to override this class (but you may want to extend
transform_entities()
instead if your extension doesn’t natively handle Entity objects).
- render_response(text: str, *_, with_follow_on_turn: bool | None = None, **__) bool #
Render a response text as audio over the configured TTS plugin.
- Parameters:
text – Text to render.
with_follow_on_turn – If set, the assistant will wait for a follow-up. By default,
with_follow_on_turn
will be automatically set to true if thetext
ends with a question mark.
- Returns:
True if the assistant is waiting for a follow-up, False otherwise.
- resume_detection(*_, **__)#
Resume the assistant hotword detection from a paused state.
- send_text_query(text: str, *_, **__)[source]#
If the
tts_plugin
configuration is set, then the assistant will process the given text query throughplatypush.plugins.openai.OpenaiPlugin.get_response()
and render the response through the specified TTS plugin.- Returns:
The response received from
platypush.plugins.openai.OpenaiPlugin.get_response()
.
- start()#
Start the plugin.
- start_conversation(*_, **__)[source]#
Start a conversation with the assistant. The conversation will be automatically stopped after
conversation_max_duration
seconds of audio, or afterconversation_start_timeout
seconds of silence with no audio detected, or afterconversation_end_timeout
seconds after the last non-silent audio frame has been detected, or when thestop_conversation()
method is called.
- status(*_, **__)#
- Returns:
The current assistant status:
{ "last_query": "What time is it?", "last_response": "It's 10:30 AM", "conversation_running": true, "is_muted": false, "is_detecting": true }
- stop_conversation(*_, **__)#
Programmatically stops a conversation.
- toggle_mute(*_, **__)#
Toggle the mute state of the microphone.
- transform_entities(entities: Collection[AssistantPlugin], **_)#
This method takes a list of entities in any (plugin-specific) format and converts them into a standardized collection of Entity objects. Since this method is called by
publish_entities()
before entity updates are published, you may usually want to extend it to pre-process the entities managed by your extension into the standard format before they are stored and published to all the consumers.
- unmute(*_, **__)[source]#
Note
This plugin has no hotword detection, thus no continuous audio detection. Speech processing is done on-demand through the
start_conversation()
andstop_conversation()
methods. Therefore, themute()
andunmute()
methods are not implemented.
- wait_stop(timeout=None)#
Wait until a stop event is received.