Skip to main content

Speak Sentence

Text Content

textThe text to speak. Cannot be blank. Can be a mixture of plain text and SSML tags (see below for list of supported tags).


Attributes of the speaker may be changed using these values. The default speaker is a female speaker with locale en_US.

To change the gender or locale of the speaker, change the appropriate attributes and a new voice will be selected that matches the new gender and locale.

To choose a specific voice by name, use the voice attribute.

voiceSelects the voice of the speaker. Consult the voice column in the below table for valid values.

If the voice attribute is present, gender and locale are ignored.
genderSelects the gender of the speaker. Valid values are "male" or "female".

Default "female"
localeSelects the locale of the speaker. Consult the locale column in the below table for valid values.

Default "en_US"

Supported Voices

The table below shows a mapping of our supported voices. It also maps our voice names to our current TTS provider (Amazon Polly). Please note that in order to provide our customers the best experience possible we reserve the right to change TTS providers in the future.

voicelocalegenderProvider Name (AWS Polly)

Supported SSML Tags

List of supported SSML tags.

Full details about SSML tags can be found at:


Adds a pause to the speech. You can specify the duration of the pause by using either the strength or the time attributes.


  • strength: (optional) accepted values are: none, x-weak, weak, medium (default), strong or x-strong
  • time: (optional) the duration of the pause in seconds or milliseconds (e.g. 1s or 1000ms) with a maximum value of 10 seconds


The contained text is spoken with emphasis.


  • level: (optional) defines the strength of emphasis to be applied, accepted values are: strong, moderate or reduced


Specifies the natural language of the content. If you use a voice with an en_US locale and an es-MX lang, it will sound like an English-speaking American attempting to speak Spanish.


  • xml:lang: specifies the language, accepted values are:
    • da-DK: Danish
    • nl-NL: Dutch
    • en-AU: English, Australian
    • en-GB: English, British
    • en-IN: English, Indian
    • en-US: English, US
    • fr-FR: French
    • fr-CA: French, Canadian
    • hi-IN: Hindi
    • de-DE: German
    • is-IS: Icelandic
    • it-IT: Italian
    • ja-JP: Japanese
    • ko-KR: Korean
    • nb-NO: Norwegian
    • pl-PL: Polish
    • pt-BR: Portuguese, Brazilian
    • pt-PT: Portuguese, European
    • ro-RO: Romanian
    • ru-RU: Russian
    • es-ES: Spanish, European
    • es-MX: Spanish, Mexican
    • es-US: Spanish, US
    • sv-SE: Swedish
    • tr-TR: Turkish
    • cy-GB: Welsh


Adds a pause between paragraphs.


Use phonetic pronunciation for specific text.



Controls the volume, rate and pitch of the speech.


  • pitch: (optional) changes the pitch, accepted values: x-low, low, medium, high, x-high, default or a relative change in % (e.g. -15% or 20%)
  • rate: (optional) changes the speaking rate, accepted values: x-slow, slow, medium, fast, x-fast or any positive percentage (e.g. 50% for a speaking rate of half the default rate or 200% for a speaking rate twice the default rate)
  • volume: (optional) changes the volume, accepted values: silent, x-soft, soft, medium, loud, x-loud, default or the volume in dB (e.g. +1dB or -6dB)


Adds a pause between lines or sentences.


Indicates how to interpret the text.
More information at:


  • interpret-as: accepted values:
    • date: the contained text is a Gregorian calendar date, must specify the format attribute, see below
    • time: the contained text is a time in minutes and seconds (e.g. 1'20")
    • telephone: the contained text is a 7-digit or 10-digit telephone number (e.g. 2025551212)
    • characters: enclosed text should be spoken as a series of alpha-numeric characters
    • cardinal: the enclosed text is an integral or decimal number and should be spoken as a cardinal number
    • ordinal: the enclosed text is an integral number and should be spoken as an ordinal number
  • format: (optional) used with interpret-as="date" to specify the date format, accepted values:
    • mdy: Month-day-year
    • dmy: Day-month-year
    • ymd: Year-month-day
    • md: Month-day
    • dm: Day-month
    • ym: Year-month
    • my: Month-year
    • d: Day
    • m: Month
    • y: Year


The text in the alias attribute replaces the contained text for pronunciation.


  • alias : string to be spoken

Webhooks Received

There are no webhooks received after the <SpeakSentence> verb is executed.


Speak Sentence

<?xml version="1.0" encoding="UTF-8"?>
<SpeakSentence voice="julie">
This is a test.

Speak Sentence with SSML

<?xml version="1.0" encoding="UTF-8"?>
<SpeakSentence voice="jorge">
Hello, you have reached the home of <lang xml:lang="es-MX">Antonio Mendoza</lang>.
Please leave a message.