Wake words are special words or phrases that tell a voice assistant that a command is about to be spoken. The device then switches from passive to active listening. Examples are: Hey Google, Hey Siri, or Alexa. Home Assistant supports its own wake words, such as Hey Nabu.
- The wake words have to be processed extremely fast: You can’t have a voice assistant start listening 5 seconds after a wake word is spoken.
- There is little room for false positives.
- Wake word processing is based on compute-intensive AI models.
- Voice satellite hardware generally does not have a lot of computing power, so wake word engines need hardware experts to optimize the models to run smoothly.
To avoid being limited to specific hardware, the wake word detection is done inside Home Assistant. Voice satellite devices constantly sample current audio in your room for voice. When it detects voice, the satellite sends audio to Home Assistant where it checks if the wake word was said and handle the command that followed it.
This means any device that streams audio can be turned into a voice satellite, even if it isn’t powerful enough to run wake word detection locally. It also allows our developer community to experiment with wake word models without having to shrink the model to run on a low-powered voice satellite device.
Overview of the wake word architecture
The quality of the captured audio differs between devices. A speakerphone with multiple microphones and audio processing chips captures voice very cleanly. A device with a single microphone and no post-processing? Not so much. We compensate for poor audio quality with audio post-processing inside Home Assistant and users can use better speech-to-text models to improve accuracy like the one included with Home Assistant Cloud.
Each satellite requires ongoing resources inside Home Assistant while it’s streaming audio. Currently, users can have 5 voice satellites streaming audio at the same time without overwhelming a Raspberry Pi 4. To scale up, we’ve updated the Wyoming protocol to allow users to run wake word detection on an external server.
Home Assistant’s wake words are leveraging a new project called openWakeWord by David Scripka. This project has real-world accuracy, runs on commodity hardware and anyone can train a basic model of their own wake word.
Users can pick per configured voice assistant what wake word to listen for
openWakeWord is created with 4 goals in mind:
- Be fast enough for real-world usage.
- Be accurate enough for real-world usage.
- Have a simple model architecture and inference process.
- Require little to no manual data collection to train new models.
openWakeWord is built around an open source audio embedding model trained by Google and fine-tuned using the text-to-speech system Piper. Piper generates many thousands of audio clips for each wake word, creating variations of different speakers. These audio clips are then augmented to sound as if they were spoken in multiple kinds of rooms, at specific distances from a microphone, and with varying speeds. Finally, the clips are mixed with background noise like music, environmental sounds, and conversation before being fed into the training process to generate the wake word model.
Overview of the openWakeWord training pipeline.
OpenWakeWord currently only works for English wake words. This is because there is still a lack of models in other languages with many different speakers. Similar models for other languages can be trained as more multi-speaker models per language become available.
If you’re not running Home Assistant OS, openWakeWord is also available as a Docker container. Once the container is running, you will need to add the Wyoming integration and point it at its IP address and port (typically 10400).
Home Assistant ships with defaults but allows users to configure each part of their voice assistants. This also applies to wake words.
You can add other wake word engines as an integration or run them as a standalone program that communicates with Home Assistant via the Wyoming protocol.
How wake words integrate into Home Assistant
As an example, we’re also making the Porcupine (v1) wake word engine available. It supports 29 wake words across English, French, Spanish, and German. The wake words include Computer, Framboise, Manzana, and Stachelschwein.
To try wake words today, follow the guide to the $13 voice assistant.