Speak Sentence

Text Content

Name	Description
text	The text to speak. Cannot be blank. Can be a mixture of plain text and SSML tags (see below for list of supported tags).

Attributes

Attributes of the speaker may be changed using these values. The default speaker is a female speaker with locale en_US.

To change the gender or locale of the speaker, change the appropriate attributes and a new voice will be selected that matches the new gender and locale.

To choose a specific voice by name, use the voice attribute.

Attribute	Description
voice	Selects the voice of the speaker. Consult the `voice` column in the below table for valid values. If the `voice` attribute is present, `gender` and `locale` are ignored.
gender	Selects the gender of the speaker. Valid values are `"male"` or `"female"`. Default `"female"`. If the chosen gender does not exist for the region, the opposite gender will be used by default.
locale	Selects the locale of the speaker. Consult the `locale` column in the below table for valid values. Default `"en_US"`

Supported Voices

The table below shows a mapping of our supported voices. It also maps our voice names to our current TTS provider (Amazon Polly). Please note that in order to provide our customers the best experience possible we reserve the right to change TTS providers in the future.

voice	locale	gender	Provider Name (AWS Polly)	Voice Type
julie	en_US	female	Joanna	Standard
kate	en_US	female	Kendra	Standard
susan	en_US	female	Kimberly	Standard
dave	en_US	male	Matthew	Standard
paul	en_US	male	Matthew	Standard
bridget	en_UK	female	Amy	Standard
simon	en_UK	male	Brian	Standard
katrin	de	female	Marlene	Standard
stefan	de	male	Hans	Standard
esperanza	es	female	Conchita	Standard
violeta	es	female	Lucia	Standard
jorge	es	male	Enrique	Standard
rosa	es_MX	female	Mia	Standard
jolie	fr	female	Celine	Standard
bernard	fr	male	Mathieu	Standard
paola	it	female	Carla	Standard
luca	it	male	Giorgio	Standard
masako	ja	female	Mizuki	Standard
kenji	ja	male	Takumi	Standard
nadiya	ru	female	Tatyana	Standard
anatoli	ru	male	Maxim	Standard
zeina	arb	female	Zeina	Standard
zhiyu	cmn-CN	female	Zhiyu	Standard
ruth	en_US	female	Ruth	Enhanced*
stephen	en_US	male	Stephen	Enhanced*
lupe	es_US	female	Lupe	Enhanced*
pedro	es_US	male	Pedro	Enhanced*
gabrielle	fr_CA	female	Gabrielle	Enhanced*
liam	fr_CA	male	Liam	Enhanced*
salli	en_US	female	Salli	Standard
salli_enh	en_US	female	Salli	Enhanced*
chantal	fr_CA	female	Chantal	Standard
miguel	es_US	male	Miguel	Standard
joey	en_US	male	Joey	Standard
joey_enh	en_US	male	Joey	Enhanced*
penelope	es_US	female	Penelope	Standard
russell	es_AU	male	Russell	Standard
emma	en_GB	female	Emma	Standard
emma_enh	en_GB	female	Emma	Enhanced*
nicole	en_AU	female	Nicole	Standard
raveena	en_IN	female	Raveena	Standard
mads	da_DK	male	Mads	Standard
justin	en_US	male	Justin	Enhanced*
ivy	en_US	female	Ivy	Standard
ivy_enh	en_US	female	Ivy	Enhanced*
carmen	ro_RO	female	Carmen	Standard
naja	da_DK	female	Naja	Standard
ruben	nl_NL	male	Ruben	Standard
geraint	en_GB_WLS	male	Geraint	Standard

* Please note that the “Enhanced” voices are based on Amazon Polly Neural voices which provide more natural sounding speech but their use will incur additional costs.

Supported SSML Tags

List of supported SSML tags.

Full details about SSML tags can be found at: https://www.w3.org/TR/2010/REC-speech-synthesis11-20100907/

`<break>`

Adds a pause to the speech. You can specify the duration of the pause by using either the strength or the time attributes.

Attributes:

strength: (optional) accepted values are: none, x-weak, weak, medium (default), strong or x-strong
time: (optional) the duration of the pause in seconds or milliseconds (e.g. 1s or 1000ms) with a maximum value of 10 seconds

`<emphasis>`

The contained text is spoken with emphasis.

Attributes:

level: (optional) defines the strength of emphasis to be applied, accepted values are: strong, moderate or reduced

note

<emphasis> is supported exclusively by the Standard TTS format and is not supported by Neural voices.

`<lang>`

Specifies the natural language of the content. If you use a voice with an en_US locale and an es-MX lang, it will sound like an English-speaking American attempting to speak Spanish.

Attributes:

xml:lang: specifies the language, accepted values are:
- arb: Arabic
- cmn-CN: Chinese, Mandarin
- da-DK: Danish
- nl-NL: Dutch
- en-AU: English, Australian
- en-GB: English, British
- en-IN: English, Indian
- en-US: English, US
- fr-FR: French
- fr-CA: French, Canadian
- hi-IN: Hindi
- de-DE: German
- is-IS: Icelandic
- it-IT: Italian
- ja-JP: Japanese
- ko-KR: Korean
- nb-NO: Norwegian
- pl-PL: Polish
- pt-BR: Portuguese, Brazilian
- pt-PT: Portuguese, European
- ro-RO: Romanian
- ru-RU: Russian
- es-ES: Spanish, European
- es-MX: Spanish, Mexican
- es-US: Spanish, US
- sv-SE: Swedish
- tr-TR: Turkish
- cy-GB: Welsh

`<p>`

Adds a pause between paragraphs.

`<phoneme>`

Use phonetic pronunciation for specific text.

Attributes:

ph: International Phonetic Alphabet (IPA) symbols, for more information see: http://www.internationalphoneticalphabet.org

`<prosody>`

Controls the volume, rate and pitch of the speech.

Attributes:

pitch: (optional) changes the pitch, accepted values: x-low, low, medium, high, x-high, default or a relative change in % (e.g. -15% or 20%)
rate: (optional) changes the speaking rate, accepted values: x-slow, slow, medium, fast, x-fast or any positive percentage (e.g. 50% for a speaking rate of half the default rate or 200% for a speaking rate twice the default rate)
volume: (optional) changes the volume, accepted values: silent, x-soft, soft, medium, loud, x-loud, default or the volume in dB (e.g. +1dB or -6dB)

note

<prosody> attributes are fully supported by the standard TTS voices. Neural voices support the volume and rate attributes, but don't support the pitch attribute.

`<s>`

Adds a pause between lines or sentences.

`<say-as>`

Indicates how to interpret the text.
More information at: https://www.w3.org/TR/2005/NOTE-ssml-sayas-20050526/

Attributes:

interpret-as: accepted values:
- date: the contained text is a Gregorian calendar date, must specify the format attribute, see below
- time: the contained text is a time in minutes and seconds (e.g. 1'20")
- telephone: the contained text is a 7-digit or 10-digit telephone number (e.g. 2025551212)
- characters: enclosed text should be spoken as a series of alpha-numeric characters
- cardinal: the enclosed text is an integral or decimal number and should be spoken as a cardinal number
- ordinal: the enclosed text is an integral number and should be spoken as an ordinal number
format: (optional) used with interpret-as="date" to specify the date format, accepted values:
- mdy: Month-day-year
- dmy: Day-month-year
- ymd: Year-month-day
- md: Month-day
- dm: Day-month
- ym: Year-month
- my: Month-year
- d: Day
- m: Month
- y: Year

note

The <say-as> tag includes the interpret-as attribute with a characters value, which is not currently supported by neural voices. If your SSML includes this attribute and is processed with a neural voice, the affected text will be synthesized using a standard voice instead. Please note that even though a standard voice is used for this part of the synthesis, the billing will still be based on the neural voice

`<sub>`

The text in the alias attribute replaces the contained text for pronunciation.

Attributes:

alias : string to be spoken

Webhooks Received

There are no webhooks received after the <SpeakSentence> verb is executed.

Examples

Speak Sentence

<?xml version="1.0" encoding="UTF-8"?>
<Response>
   <SpeakSentence voice="julie">
      This is a test.
   </SpeakSentence>
</Response>

SpeakSentence speakSentenceStart = new SpeakSentence("This is a test.").builder()
        .voice(TtsVoice.JULIE)
        .build();

Response response = new Response()
        .with(speakSentence);

System.out.println(response.toBXML());

SpeakSentence speakSentence = new SpeakSentence
{
    Sentence = "This is a test.",
    Voice = "julie"
};

Response response = new Response();
response.Add(speakSentence);

Console.WriteLine(response.ToBXML());

speak_sentence = Bandwidth::Bxml::SpeakSentence.new('This is a test.', { voice: 'julie' })
response = Bandwidth::Bxml::Response.new([speak_sentence])

p response.to_bxml

const speakSentence = new Bxml.SpeakSentence('This is a test.', {
    voice: 'julie'
});
const response = new Bxml.Response(speakSentence);

console.log(response.toBxml());

speak_sentence = SpeakSentence(
    text="This is a test.",
    voice="julie"
)

response = Response()
response.add_verb(speak_sentence)

print(response.to_bxml())

$speakSentence = new BandwidthLib\Voice\Bxml\SpeakSentence("This is a test.");
$speakSentence->voice("julie");

$response = new BandwidthLib\Voice\Bxml\Response();
$response->addVerb($speakSentence);

echo $response->toBxml();

Speak Sentence with SSML

<?xml version="1.0" encoding="UTF-8"?>
<Response>
    <SpeakSentence voice="jorge">
        Hello, you have reached the home of <lang xml:lang="es-MX">Antonio Mendoza</lang>.
        Please leave a message.
    </SpeakSentence>
</Response>

SpeakSentence speakSentence = new SpeakSentence("Hello, you have reached the home of <lang xml:lang=\"es-MX\">Antonio Mendoza</lang>. Please leave a message.").builder()
        .voice(TtsVoice.JORGE)
        .build();

Response response = new Response()
        .with(speakSentence);

System.out.println(response.toBXML());

Response response = new Response();

SpeakSentence speakSentence = new SpeakSentence
{
    Sentence = "Hello, you have reached the home of <lang xml:lang=\"es-MX\">Antonio Mendoza</lang>. Please leave a message.",
    Voice = "jorge"
};

response.Add(speakSentence);

Console.WriteLine(response.ToBXML());

speak_sentence = Bandwidth::Bxml::SpeakSentence.new('Hello, you have reached the home of <lang xml:lang="es-MX">Antonio Mendoza</lang>. Please leave a message.', {
  voice: 'jorge'
})
response = Bandwidth::Bxml::Response.new([speak_sentence])

p response.to_bxml

const speakSentence = new Bxml.SpeakSentence(
    `Hello, you have reached the home of <lang xml:lang="es-MX">Antonio Mendoza</lang>. Please leave a message.`,
    {
        voice: 'jorge'
    }
);
const response = new Bxml.Response(speakSentence);

console.log(response.toBxml());

speak_sentence = SpeakSentence(
    text='Hello, you have reached the home of <lang xml:lang="es-MX">Antonio Mendoza</lang>. Please leave a message.',
    voice="jorge"
)

response = Response()
response.add_verb(speak_sentence)

print(response.to_bxml())

$speakSentence = new BandwidthLib\Voice\Bxml\SpeakSentence('Hello, you have reached the home of <lang xml:lang="es-MX">Antonio Mendoza</lang>. Please leave a message.');
$speakSentence->voice("jorge");

$response = new BandwidthLib\Voice\Bxml\Response();
$response->addVerb($speakSentence);

echo $response->toBxml();

Text Content​

Attributes​

Supported Voices​

Supported SSML Tags​

<break>​

<emphasis>​

<lang>​

<p>​

<phoneme>​

<prosody>​

<s>​

<say-as>​

<sub>​

Webhooks Received​

Examples​

Speak Sentence​

Speak Sentence with SSML​

Text Content

Attributes

Supported Voices

Supported SSML Tags

`<break>`

`<emphasis>`

`<lang>`

`<p>`

`<phoneme>`

`<prosody>`

`<s>`

`<say-as>`

`<sub>`

Webhooks Received

Examples

Speak Sentence

Speak Sentence with SSML