Start Transcription

The StartTranscription verb allows a segment of a call to be transcribed, and optionally for the live transcription to be sent off to another destination for additional processing. The transcription will continue until the call ends or the <StopTranscription> verb is used. When a destination is specified, live transcription updates for one or both sides (tracks) of the call will be sent to the specified destination. A total of 4 concurrent track transcriptions are allowed on a call. A <StartTranscription> request that uses both tracks will count as 2 of the permitted 4 concurrent track transcriptions.

A call has only two tracks, which are named after the direction of the media from the perspective of the Programmable Voice platform:

inbound: media received by Programmable Voice from the call executing the BXML;
outbound: media sent by Programmable Voice to the call executing the BXML.

Note that this has no correlation to the direction of the call itself. For example, if either an inbound or outbound call is being transcribed and executes a <SpeakSentence>, the inbound track will be the callee's audio and the outbound track will be the text-to-speech audio.

Text Content

There is no text content available to be set for the <StartTranscription> verb.

Attributes

Attribute	Description
name	(optional) A name to refer to this transcription by. This name can only contain alphanumeric characters, underscores, and dashes, and it has a maximum length of 255 characters. Used when sending `<StopTranscription>`. If not provided, it will default to the generated transcription id as sent in the `Real-Time Transcription Started` webhook.
tracks	(optional) The part of the call to send a transcription from. `inbound`, `outbound` or `both`. Default is `inbound`.
transcriptionEventUrl	(optional) URL to send the associated Webhook events to during this real-time transcription's lifetime. Does not accept BXML. May be a relative URL.
transcriptionEventMethod	(optional) The HTTP method to use for the request to `transcriptionEventUrl`. GET or POST. Default value is POST.
username	(optional) The username to send in the HTTP request to `transcriptionEventUrl`. If specified, the `transcriptionEventUrl` must be TLS-encrypted (i.e., `https`).
password	(optional) The password to send in the HTTP request to `transcriptionEventUrl`. If specified, the `transcriptionEventUrl` must be TLS-encrypted (i.e., `https`).
destination	(optional) A websocket URI to send the transcription to. A transcription of the specified tracks will be sent via websocket to this URL as a series of JSON messages. See below for more details on the websocket packet format.
stabilized	(optional) Whether to send transcription update events to the specified `destination` only after they have become stable. Defaults to `true`.
detectLanguage	(optional) A boolean value to indicate that the recording may not be in English, and the transcription service will need to detect the dominant language the recording is in and transcribe accordingly. Default is `false`. Supported Languages: - `en-US`: English, US - `pt-PT`: Portuguese - `de-DE`: German - `it-IT`: Italian - `es-ES`: Spanish - `fr-FR`: French - `nl-NL`: Dutch - `ro-RO`: Romanian
preferredLanguages	(optional) A comma-separated list of language locales in which the transcription should be performed. Defaults to `en-US` if not specified. Requires `detectLanguage` to be `false`. Supported Languages: - `en-US`: English, US - `en-GB`: English, British - `pt-PT`: Portuguese - `pt-BR`: Portuguese, Brazilian - `de-DE`: German - `de-CH`: German, Swiss - `it-IT`: Italian - `es-ES`: Spanish US - `es-US`: Spanish, US - `fr-FR`: French - `fr-CA`: French, Canadian - `nl-NL`: Dutch - `ro-RO`: Romanian Note: You can select only one dialect per language. For example, you cannot choose both `en-US` and `en-GB` as preferredLanguages.

If the destination and transcriptionEventUrl attributes are specified, then the Real-Time Transcription Started, Real-Time Transcription Rejected and Real-Time Transcription Stopped events will be sent to the URL when the transcription starts, if there is an error starting the transcription and when the transcription ends respectively. BXML returned in response to this callback will be ignored. If the transcriptionEventUrl attribute is specified, then the Real-Time Transcription Available event will be sent once the transcription has ended providing a URL from where the transcription can be downloaded. BXML returned in response to this callback will be ignored.

note

While multiple real-time transcriptions for the same call are allowed, each real-time transcription MUST have a unique name. Attempting to start a real-time transcription on the same call with the name of an already existing real-time transcription will result in a Real-Time Transcription Rejected event.

Webhooks Received

Webhooks	Can reply with more BXML
Real-Time Transcription Started	No
Real-Time Transcription Rejected	No
Real-Time Transcription Stopped	No
Real-Time Transcription Available	No

Nested Tags

You may specify up to 12 <CustomParam/> elements nested within a <StartTranscription> tag. These elements define optional user specified parameters that will be sent to the destination URL when the real-time transcription is first started.

CustomParam Attributes

Attribute	Description
name	(required) The name of this parameter, up to 256 characters.
value	(required) The value of this parameter, up to 2048 characters.

Websocket Packet Format

If a destination is specified, it will be sent JSON messages for the duration of the real-time transcription. There will be an initial start message when the connection is first established. This will be followed by zero or more transcription messages containing transcription updates for the tracks being transcribed. Finally, when a real-time transcription is stopped, a stop message will be sent.

Start and Stop Message Parameters

Parameter	Description
eventType	What type of message this is, one of `start`, or `stop`
metadata	Details about the real-time transcription this message is for. See further details below.
customParams	(optional) (`start` message only) If any `<CustomParam/>` elements were specified in the `<StartTranscription>` request, they will be copied here as a map of `name : value` pairs

Metadata Parameters

Parameter	Description
accountId	The user account associated with the call
callId	The call id associated with the real-time transcription
realTimeTranscriptionId	The unique id of the real-time transcription
transcriptionName	The user supplied name of the real-time transcription
tracks	A list of one or more tracks being transcribed in real-time
tracks.name	The name of the track being transcribed, will be used to identify which transcription updates belong to which track
stabilized	Whether transcription updates will be sent only after they have become stable or not
detectLanguage	A boolean value to indicate that the recording may not be in English, and the transcription service will need to detect the dominant language the recording is in and transcribe accordingly
preferredLanguages	A list of language locales in which the transcription should be performed

Transcription Message Parameters

Parameter	Description
eventType	Will always be `transcription`
track	The name of the track this transcription update is for, will be one of the names specified in the `start` message
startTime	The time at which this segment started
endTime	The time at which this segment ended
isPartial	Indicates if the segment is complete
language	The detected language of the segment
transcript	The transcription of this segment as a flattened string
items	The list of items making up this segment
items.content	A word or punctuation
items.startTime	The time at which this item started
items.endTime	The time at which this item ended
items.confidence	The confidence score associated with a word or phrase in your transcript.
items.stable	Indicates whether the specified item is stable (true) or if it may change when the segment is complete (false).
items.type	Either `PRONUNCIATION` or `PUNCTUATION`

Examples

Transcribe an Active Call

XML
Ruby
NodeJS
Python

<?xml version="1.0" encoding="UTF-8"?>
<Response>
    <SpeakSentence voice="bridget">This call is being streamed to a live studio audience and transcribed.</SpeakSentence>
    <StartTranscription name="live_audience" tracks="both" destination="wss://live-studio-audience.myapp.example.com" transcriptionEventUrl="https://myapp.example.com/noBXML">
        <CustomParam name="custom_param_1" value="value_1"/>
        <CustomParam name="custom_param_2" value="value_2"/>
    </StartTranscription>
</Response>

speak_sentence = Bandwidth::Bxml::SpeakSentence('This call is being streamed to a live studio audience and transcribed.', {
  voice: 'bridget'
})
custom_param_1 = Bandwidth::Bxml::CustomParam({name: 'custom_param_1', value: 'value_1' })
custom_param_2 = Bandwidth::Bxml::CustomParam({name: 'custom_param_2', value: 'value_2' })
start_transcription = Bandwidth::Bxml::StartTranscription([custom_param_1, custom_param_2], {
  name: 'live_audience',
  tracks: 'both',
  destination: 'wss://live-studio-audience.myapp.example.com',
  transcription_event_url: 'https://myapp.example.com/noBXML'
})
response = Bandwidth::Bxml::Response.new([speak_sentence, start_transcription])

p response.to_bxml

const speakSentence = new Bxml.SpeakSentence(
    'This call is being streamed to a live studio audience and transcribed.',
    {
        voice: 'bridget'
    }
);
const customParam1 = new Bxml.CustomParam({
    name: 'custom_param_1',
    value: 'value_1'
});
const customParam2 = new Bxml.CustomParam({
    name: 'custom_param_2',
    value: 'value_2'
});
const startTranscription = new Bxml.StartTranscription(
    {
        name='live_audience',
        tracks='both',
        destination='wss://live-studio-audience.myapp.example.com',
        transcription_event_url='https://myapp.example.com/noBXML'
    },
    [customParam1, customParam2]
);
const response = new Bxml.Response([speakSentence, startTranscription]);

console.log(response.toBxml());

speak_sentence = SpeakSentence(text='This call is being streamed to a live studio audience and transcribed.', voice='bridget')
custom_param_1 = CustomParam(name='custom_param_1', value='value_1')
custom_param_2 = CustomParam(name='custom_param_2', value='value_2')
start_transcription = StartTranscription(
    name='live_audience',
    tracks='both',
    destination='wss://live-studio-audience.myapp.example.com',
    transcription_event_url='https://myapp.example.com/noBXML',
    custom_params=[custom_param_1, custom_param_2]
)
response = Response()
response.add_verb(speak_sentence)
response.add_verb(start_transcription)

print(response.to_bxml())

A `start` Websocket Message

{
    "eventType": "start",
    "metadata": {
        "accountId": "5555555",
        "callId": "c-2a913f94-7fa91773-a426-4118-8b8b-b691ab0a0ae1",
        "realTimeTranscriptionId": "s-2a913f94-93e372e2-60da-4c89-beb0-0d3a219b287c",
        "transcriptionName": "live_audience",
        "tracks": [
            {
                "name": "inbound"
            },
            {
                "name": "outbound"
            }
        ]
    },
    "customParams": {
        "foo": "bar",
        "foos": "bars"
    }
}

A `transcription` Websocket Message

{
    "eventType": "transcription",
    "track": "inbound",
    "startTime": "2023-03-31T20:05.101Z",
    "endTime": "2023-03-31T20:07.493Z",
    "isPartial": false,
    "language": "en-US",
    "transcript": "hello world!",
    "items": [
        {
            "content": "hello",
            "startTime": "2023-03-31T20:05.101Z",
            "endTime": "2023-03-31T20:06.285Z",
            "confidence": 0.9,
            "stable": true,
            "type": "PRONUNCIATION"
        },
        {
            "content": "world",
            "startTime": "2023-03-31T20:06.984Z",
            "endTime": "2023-03-31T20:07.493Z",
            "confidence": 0.6,
            "stable": true,
            "type": "PRONUNCIATION"
        },
        {
            "content": "!",
            "startTime": "2023-03-31T20:07.493Z",
            "endTime": "2023-03-31T20:07.493Z",
            "confidence": 0.9,
            "stable": false,
            "type": "PUNCTUATION"
        }
    ]
}

A `stop` Websocket Message

{
    "eventType": "stop",
    "metadata": {
        "accountId": "5555555",
        "callId": "c-2a913f94-7fa91773-a426-4118-8b8b-b691ab0a0ae1",
        "realTimeTranscriptionId": "s-2a913f94-93e372e2-60da-4c89-beb0-0d3a219b287c",
        "transcriptionName": "live_audience",
        "tracks": [
            {
                "name": "inbound"
            },
            {
                "name": "outbound"
            }
        ]
    }
}

Text Content​

Attributes​

Webhooks Received​

Nested Tags​

CustomParam Attributes​

Websocket Packet Format​

Start and Stop Message Parameters​

Metadata Parameters​

Transcription Message Parameters​

Examples​

Transcribe an Active Call​

A start Websocket Message​

A transcription Websocket Message​

A stop Websocket Message​