Home ▸ Actions

Speak

Use this action to turn text into speech and play it on a media player. Each text-to-speech entity represents one speech provider, so you pick the entity for the voice you want, then pick the media player that plays the sound.

Using this action from the user interface

If you prefer building automations and scripts visually, Home Assistant walks you through this action step by step. You pick what to target, tweak a few options, and save. No YAML knowledge required.

To speak a message from an automation or a script:

Go to Settings > Automations & scenes.
Open an existing automation or script, or select Create automation > Create new automation.
If you’re setting up a new automation, add a trigger in the When section. Scripts don’t need a trigger. They run when something else calls them.
In the Then do section, select Add action.
Select what you want to control. Under By target (see Targets), select the text-to-speech entity you want to use.
From the actions shown for that target, select Speak.
Select the Media player entity to play the message on, set the Message, and any other options.
Select Save.

Options in the UI

Media player entity

The media player to play the message on.

Message

The text you want to convert into speech.

Cache (Optional)

Store this message locally so that when the same text is requested again, the output can be produced more quickly.

Language (Optional)

The language to speak the message in, using the format required by the text-to-speech entity.

Options (Optional)

Additional settings specific to the text-to-speech entity, such as voice or audio format.

Using this action in YAML

If you work directly in YAML, or you want to know exactly what Home Assistant does under the hood, this section has the technical reference. It lists the field names you use in YAML, their types, and which ones are required.

In YAML, refer to this action as tts.speak. A basic example looks like this:

Action

action: tts.speak
target:
  entity_id: tts.example
data:
  media_player_entity_id: media_player.kitchen
  message: "May the force be with you."

This speaks a message on media_player.kitchen using the tts.example entity.

Options in YAML

media_player_entity_id string Required

The media player to play the message on.

message string Required

The text you want to convert into speech.

cache boolean

Store this message locally so that when the same text is requested again, the output can be produced more quickly.

language string

The language to speak the message in, using the format required by the text-to-speech entity.

options map

Additional settings specific to the text-to-speech entity, such as voice or audio format.

The options setting can include preferred audio settings, along with any other settings the text-to-speech entity supports, such as voice or speed. Check the documentation of your text-to-speech integration for the settings it accepts.

Targets of the action

This action requires a target. The target is the object of the action. You can point the action at a single entityAn entity represents a sensor, actor, or function in Home Assistant. Entities are used to monitor physical properties or to control other entities. An entity is usually part of a device or a service. [Learn more], a device, an area, a floor, or a label, and Home Assistant will run the action on every matching tts entity behind that target.

Entity: one specific tts entity, such as tts.living_room.
Device: every tts entity that belongs to a device.
Area: every tts entity in a room or area.
Floor: every tts entity on a floor.
Label: every tts entity that shares a label.

You can also select different target types in one action. For example, you can add a specific entity and an area as targets in the same action to run the action on both of them at once.

Good to know

Caching stores the spoken result so the same message plays faster next time. For more details, see the cache section.
If a media player cannot play the audio format a provider produces, set preferred audio settings in options to convert it.

Try it yourself

Ready to test this? Open Developer tools > Actions, search for this action, fill in the fields, and select Perform action. You see what happens on your actual entitiesAn entity represents a sensor, actor, or function in Home Assistant. Entities are used to monitor physical properties or to control other entities. An entity is usually part of a device or a service. [Learn more] without writing a line of YAML.

More examples

Real scenarios where this action shows up in automations and scripts. Copy any example and adapt it to your setup.

Tip

You don’t need to edit YAML to use these examples. Copy a YAML snippet from this page, open the automation editor in Home Assistant, and press Ctrl+V (or Cmd+V on Mac). Home Assistant automatically converts the pasted YAML into the visual editor format, whether it’s a full automation, a single trigger, a condition, or an action.

Automation: announce the weather every morning

Each morning, speak the current weather on the kitchen media player.

Trigger: Time: 07:00
Action: Speak
- Target: Text-to-speech entity
- Media player entity: Kitchen
- Message: Good morning. Today’s weather is sunny.

Show example YAML

Automation

alias: "Speak the weather every morning"
triggers:
  - trigger: time
    at: "07:00:00"
actions:
  - action: tts.speak
    target:
      entity_id: tts.example
    data:
      media_player_entity_id: media_player.kitchen
      message: "Good morning. Today's weather is sunny."

Still stuck?

The Home Assistant community is quick to help: join Discord for real-time chat, post on the community forum with the action you’re calling and what you expected to happen, or share on our subreddit /r/homeassistant.

Tip

AI assistants like ChatGPT or Claude can also explain actions or suggest the right one when you describe what you want in plain language.

Related actions

These actions work well alongside this one:

Say a TTS message: Says a message on a media player using a legacy text-to-speech platform.
Clear TTS cache: Removes all cached text-to-speech files and clears the memory.

Help us improve our documentation

Suggest an edit to this page, or provide/view feedback for this page.