Microsoft Text-to-Speech

The microsoft text-to-speech platform uses Microsoft Text-to-Speech engine to read a text with natural sounding voices. This component uses an API that is part of the Cognitive Services offering and is known as the Bing Speech API. You will need an API key, which is free. You can use your Azure subscription or get an API key on the Cognitive Services site.


To enable text-to-speech with Microsoft, add the following lines to your configuration.yaml:

# Example configuration.yaml entry
  - platform: microsoft
    api_key: YOUR_API_KEY

Configuration Variables


(string)(Required)Your API key.


(string)(Optional)The language to use. Accepted values are listed in the documentation mentioned below. Note that if you set the language to anything other than the default, you will need to specify a matching voice type as well.

Default value: en-us


(string)(Optional)The gender you would like to use for the voice. Accepted values are Female and Male.

Default value: Female


(string)(Optional)The voice type you want to use. Accepted values are listed as the service name mapping in the documentation.

Default value: ZiraRUS


(integer)(Optional)Change the rate of speaking in percentage. Example values: 25, 50.

Default value: 0


(integer)(Optional)Change the volume of the output in percentage. Example values: -20, 70.

Default value: 0


(string)(Optional)Change the pitch of the output. Example values: high.

Default value: default


(string)(Optional)Change the contour of the output in percentages. This overrides the pitch setting. See the W3 SSML specification for what it does. Example value: (0,0) (100,100).

Full configuration example

A full configuration sample including optional variables:

# Example configuration.yaml entry
  - platform: microsoft
    api_key: YOUR_API_KEY
    language: en-gb
    gender: Male
    type: George, Apollo
    rate: 20
    volume: -50
    pitch: high
    contour: (0, 0) (100, 100)