Amazon Polly


The amazon_polly text-to-speech platform that works with Amazon Polly to create the spoken output. Polly is a paid service via Amazon Web Services. There is a free tier for the first 12 months and then a charge per million characters afterwards.

Setup

For more information, please read the AWS General Reference regarding Security Credentials to get the needed details. Also, check the boto3 Documentation about the profiles and the AWS Regions and Endpoints Reference for available regions.

Available voices are listed in the Amazon Documentation.

Configuration

To get started, add the following lines to your configuration.yaml (example for Amazon Polly):

# Example configuration.yaml entry
tts:
  - platform: amazon_polly
    aws_access_key_id: AWS_ACCESS_KEY_ID
    aws_secret_access_key: AWS_SECRET_ACCESS_KEY

Configuration Variables

aws_access_key_id string Required

Your AWS Access Key ID. If provided, you must also provide an aws_secret_access_key and must not provide a profile_name.

aws_secret_access_key string Required

Your AWS Secret Access Key. If provided, you must also provide an aws_access_key_id and must not provide a profile_name.

profile_name string (Optional)

A credentials profile name. If provided, you must not provide an aws_access_key_id nor an aws_secrete_access_key.

region_name string | list (Optional, default: us-east-1)

The region identifier to connect to.

text_type string (Optional, default: text)

Whether to interpret messages as text or as ssml by default.

voice string (Optional)

The Voice Name/ID to be used for generated speech by default.

output_format string (Optional, default: mp3)

Override the default output format. Either mp3, ogg_vorbis or pcm.

sample_rate string (Optional)

Override the default sample rate. Possible values are: 8000, 16000, 22050, 24000.

Default:

22050 for MP3 and Ogg Vorbis, 16000 for pcm

engine string (Optional, default: standard)

Override the default engine. Can be either of standard or neural. See Amazon documentation for compatible regions and voices.

Usage

Say to all media_player device entities:

- service: tts.amazon_polly_say
  data:
    message: "<speak>Hello from Amazon Polly</speak>"

or

- service: tts.amazon_polly_say
  data:
    message: >
      <speak>
          Hello from Amazon Polly
      </speak>

Say to the media_player.living_room device entity:

- service: tts.amazon_polly_say
  target:
    entity_id: media_player.living_room
    message: >
      <speak>
          Hello from Amazon Polly
      </speak>

Say with break:

- service: tts.amazon_polly_say
  data:
    message: >
      <speak>
          Hello from
          <break time=".9s" />
          Amazon Polly
      </speak>

Advanced usage

Amazon Polly supports accented bilingual voices and you may find that you’d prefer the voice you like be slowed down, or speeded up. If the speed of the voice is a concern, Amazon Polly provides the ability to modify this using SSML tags. First enable SSML in configuration:

  - platform: amazon_polly
    ...
    text_type: ssml
    ...

Note: You now need to enclose all new and previous TTS input within the <speak></speak> tags. To use SSML in automation, you can follow these steps, for instance:

service: tts.amazon_polly_say
data:
  cache: true
  entity_id: media_player.mpd
  message: >-
    <speak> <prosody rate="75%">나는  <prosody rate="75%">천천히</prosody> <lang
    xml:lang="fr-FR">parle</lang>.하고 있다식기세척!</speak>
  language: ko-KR