Skip to main content

Start Stream

The StartStream verb allows a segment of a call to be sent off to another destination for additional processing.

When used on a call, audio from one or both sides (tracks) of the call will be sent to the specified destination. The stream will continue until the call ends or the <StopStream> verb is used. A total of 4 concurrent track streams are allowed on a call. A <StartStream> request that uses both tracks will count as 2 concurrent streams.

A call has only two tracks, which are named after the direction of the media from the perspective of the Programmable Voice platform:

  • inbound: media received by Programmable Voice from the call executing the BXML;
  • outbound: media sent by Programmable Voice to the call executing the BXML.

Note that this has no correlation to the direction of the call itself. For example, if either an inbound or outbound call is being streamed and executes a <SpeakSentence>, the inbound track will be the callee's audio and the outbound track will be the text-to-speech audio.

Execution Behavior

<StartStream> is non-blocking — BXML execution continues to the next verb immediately after the stream is established. The stream itself runs in the background for the duration of the call.

This means that if <StartStream> is the last verb in your BXML response, the call will have no further instructions to execute and will end, closing the WebSocket connection almost immediately after it was opened.

For unidirectional streams this is usually not a problem, because you typically have other verbs after <StartStream> that keep the call alive (e.g., <SpeakSentence>, <Gather>, <Bridge>, etc.) and the stream runs in the background alongside them.

For bidirectional streams, where your service needs to send audio back over the WebSocket, you must keep the call alive while the stream is active. See Bidirectional Streams below for details.

Stream Modes

<StartStream> supports two modes via the mode attribute:

  • unidirectional (default) — Audio flows one way: from the call to your WebSocket server. Your server can listen and process but cannot send audio back to the call. Best for real-time transcription, analytics, or logging.

  • bidirectional — Audio flows both ways: from the call to your server, and your server can send audio back to the call via the same WebSocket. Best for AI-powered voice agents, real-time translation, or any use case where your service needs to "speak" into the call.

Text Content

There is no text content available to be set for the <StartStream> verb.

Attributes

AttributeDescription
name(optional) A name to refer to this stream by. Used when sending <StopStream>. If not provided, it will default to the generated stream id as sent in the Media Stream Started webhook.
mode(optional) The mode to use for the stream. unidirectional or bidirectional. Specifies whether the audio being streamed over the WebSocket is bidirectional (the service can both read and write audio over the WebSocket) or unidirectional (one-way, read-only). Default is unidirectional.
tracks(optional) The part of the call to send a stream from. inbound, outbound or both. Default is inbound.
destination(required) A websocket URI to send the stream to. The audio from the specified tracks will be sent via websocket to this URL as base64-encoded PCMU/G711 audio. See below for more details on the websocket packet format.
destinationUsername(optional) The username to send in the Authorization header of the initial websocket connection to the destination URL.
destinationPassword(optional) The password to send in the Authorization header of the initial websocket connection to the destination URL.
streamEventUrl(optional) URL to send the associated Webhook events to during this stream's lifetime. Does not accept BXML. May be a relative URL.
streamEventMethod(optional) The HTTP method to use for the request to streamEventUrl. GET or POST. Default value is POST.
username(optional) The username to send in the HTTP request to streamEventUrl. If specified, the URLs must be TLS-encrypted (i.e., https).
password(optional) The password to send in the HTTP request to streamEventUrl. If specified, the URLs must be TLS-encrypted (i.e., https).

If the streamEventUrl attribute is specified, then the Media Stream Started, Media Stream Rejected and Media Stream Stopped events will be sent to the URL when the stream starts, if there is an error starting the stream and when the stream ends respectively. BXML returned in response to this callback will be ignored.

note

While multiple streams for the same call are allowed, each stream MUST have a unique name. Attempting to start a stream on the same call with the name of an already existing stream will result in a Media Stream Rejected event.

Webhooks Received

WebhooksCan reply with more BXML
Media Stream StartedNo
Media Stream RejectedNo
Media Stream StoppedNo

Nested Tags

You may specify up to 12 <StreamParam/> elements nested within a <StartStream> tag. These elements define optional user specified parameters that will be sent to the destination URL when the stream is first started.

StreamParam Attributes

AttributeDescription
name(required) The name of this parameter, up to 256 characters.
value(required) The value of this parameter, up to 2048 characters.

Unidirectional Streams

Unidirectional mode (mode="unidirectional", the default) sends a read-only audio stream from the call to your WebSocket server. Your server receives audio but cannot send audio back.

Since <StartStream> is non-blocking, the stream runs in the background while the call continues executing subsequent BXML verbs. This makes unidirectional streams straightforward — just place <StartStream> before whatever verbs should be streamed.

Bidirectional Streams

Bidirectional mode (mode="bidirectional") opens a two-way audio stream: your server receives audio from the call and can send audio back to the call over the same WebSocket connection.

Keeping the Call Alive

Because <StartStream> is non-blocking, you must ensure the call stays alive while your bidirectional stream is active. If the BXML execution runs out of verbs, the call ends and the WebSocket closes.

There are two approaches:

<StopStream> with wait="true" (recommended) — Place a <StopStream name="..." wait="true"/> after your <StartStream>, where the name matches the one given to <StartStream>. This holds the call open until the WebSocket connection is closed (either by your server or by a subsequent BXML response). Your server has full control over when the stream ends.

<?xml version="1.0" encoding="UTF-8"?>
<Response>
<StartStream name="ai_agent" mode="bidirectional" destination="wss://ai-agent.myapp.example.com" streamEventUrl="https://myapp.example.com/events">
<StreamParam name="call_context" value="support_queue" />
</StartStream>
<StopStream name="ai_agent" wait="true"/>
</Response>

<Pause> (alternative) — Place a <Pause length="..."/> after <StartStream> to keep the call alive for a fixed duration. This is less elegant but caps the maximum connection time, which can be useful as a cost protection mechanism if you want to ensure streams don't run indefinitely.

<?xml version="1.0" encoding="UTF-8"?>
<Response>
<StartStream name="ai_agent" mode="bidirectional" destination="wss://ai-agent.myapp.example.com"/>
<!-- Keep the call alive for up to 10 minutes -->
<Pause length="600"/>
</Response>
caution

If using <Pause>, the stream will end when the pause duration expires regardless of whether your server is still processing. Use <StopStream wait="true"/> if your service needs to control the stream duration dynamically.

Sending Audio to the Call

In bidirectional mode, your WebSocket server can send JSON messages back over the connection to play audio into the call or clear buffered audio.

Play Audio Event

Send a playAudio event to play audio into the call:

{
"eventType": "playAudio",
"media": {
"contentType": "audio/pcmu",
"payload": "<base64-encoded-audio>"
}
}

The media.contentType field describes the format of the audio. Supported values:

  • audio/pcmu (8-bit, 8kHz, mono, μ-law format)
  • audio/pcm (supports rate=8000, rate=16000, rate=24000; channels=1; bit-depth=16; endian=little; encoding=signed)

Any other combination or unsupported parameters will be rejected.

Example content type values:

  • audio/pcmu
  • audio/pcm
  • audio/pcm;rate=8000
  • audio/pcm;rate=16000
  • audio/pcm;rate=24000
  • audio/pcm;rate=16000;channels=1;bit-depth=16;endian=little;encoding=signed

If audio/pcm is sent, it will be automatically resampled to 8kHz μ-law (audio/pcmu) for downstream processing. Only mono (single-channel), 16-bit, little-endian, signed PCM is accepted for PCM input.

When audio/pcm is used without additional parameters, the defaults are rate=8000, channels=1, bit-depth=16, endian=little, encoding=signed.

note

If possible, it is recommended to use audio/pcmu for the best performance and compatibility. The audio data will not be resampled or re-encoded, and will be sent directly to the destination as-is.

Clear Event

All audio sent to the server will be buffered until it is transmitted. You can send a clear event to discard any untransmitted audio currently buffered. This is useful for barge-in scenarios where the caller interrupts and you want to stop the current playback immediately.

{
"eventType": "clear"
}

Behavior:

  • All buffered, untransmitted audio bytes will be skipped.
  • New audio sent after the clear event will be processed as usual.
  • No error will occur if the buffer is already empty.

Websocket Packet Format

At the destination end, the websocket will receive messages containing JSON for the duration of the stream. There will be an initial start message when the connection is first established. This will be followed by zero or more media messages containing the encoded audio for the tracks being streamed. Each media message includes a per-track sequenceNumber (as a string) that starts at "1" and increments by 1 for each subsequent message on that track. Finally, when a stream is stopped, a stop message will be sent.

Start and Stop Message Parameters

ParameterDescription
eventTypeWhat type of message this is, one of start, or stop
metadataDetails about the stream this message is for. See further details below.
streamParams(optional) (start message only) If any <StreamParam/> elements were specified in the <StartStream> request, they will be copied here as a map of name : value pairs

Metadata Parameters

ParameterDescription
accountIdThe user account associated with the call
callIdThe call id associated with the stream
streamIdThe unique id of the stream
streamNameThe user supplied name of the stream
tracksA list of one or more tracks being sent in this stream
tracks.nameThe name of the track being sent, will be used to identify which media messages belong to which track
tracks.mediaFormatThe format the media will take for this track
tracks.mediaFormat.encodingThe encoding of the media for this track; currently only audio/PCMU is supported
tracks.mediaFormat.sampleRateThe sample rate of the media for this track, currently only 8000 is supported

Media Message Parameters

ParameterDescription
eventTypeWill always be media
trackThe name of the track this media packet is for, will be one of the names specified in the start message
payloadA base64 encoded string of actual media. The encoding of the media itself is as specified in the start message
sequenceNumberA string containing a monotonically increasing sequence number for this track. The first media message on a track has the value "1" and increments by 1.

Examples

Stream Both Legs of A Call

<?xml version="1.0" encoding="UTF-8"?>
<Response>
<SpeakSentence voice="bridget">This call is being streamed to a live studio audience.</SpeakSentence>
<StartStream name="live_audience" tracks="both" destination="wss://live-studio-audience.myapp.example.com" streamEventUrl="https://myapp.example.com/noBXML">
<StreamParam name="internal_id" value="call_ABC" />
</StartStream>
<SpeakSentence voice="bridget">This will now be streamed to the destination as well as played to the call participants.</SpeakSentence>
</Response>

Bidirectional Stream with AI Agent

<?xml version="1.0" encoding="UTF-8"?>
<Response>
<SpeakSentence voice="bridget">Please hold while I connect you to our AI assistant.</SpeakSentence>
<StartStream name="ai_agent" mode="bidirectional" destination="wss://ai-agent.myapp.example.com" streamEventUrl="https://myapp.example.com/events">
<StreamParam name="caller_id" value="+15551234567" />
</StartStream>
<StopStream name="ai_agent" wait="true"/>
</Response>

In this example, <StopStream wait="true"/> keeps the call alive until the WebSocket connection is closed by the AI agent service. The AI agent can send audio back to the caller via playAudio events over the WebSocket.

A start Websocket Message sent by the Bandwidth API over the WebSocket

{
"eventType": "start",
"metadata": {
"accountId": "5555555",
"callId": "c-2a913f94-7fa91773-a426-4118-8b8b-b691ab0a0ae1",
"streamId": "s-2a913f94-93e372e2-60da-4c89-beb0-0d3a219b287c",
"streamName": "live_audience",
"tracks": [
{
"name": "inbound",
"mediaFormat": {
"encoding": "PCMU",
"sampleRate": 8000
}
},
{
"name": "outbound",
"mediaFormat": {
"encoding": "PCMU",
"sampleRate": 8000
}
}
]
},
"streamParams": {
"foo": "bar",
"foos": "bars"
}
}

A media Websocket Message sent by the Bandwidth API over the WebSocket

{
"eventType": "media",
"track": "inbound",
"payload": "3Ob2dV1NRUpSXfTy69bHvbzD09PL0trpaWZMTV5PT05DRUpNYeLyb+jc1tPW3tfN1/r4cFZd5PxXXGjo2M/M0NTU0Nvi31ZFTFhLQERKT19safHd18zIycjHyc3Z4+7s609GSktMS1hmVFBm3eZk2tB4ffJ17/5r5dLb5uLd1c3UdmZnc/jt3eH9a3H06dvV3WNPYXxjS0BJT05VXm53+A==",
"sequenceNumber": "1"
}

A stop Websocket Message sent by the Bandwidth API over the WebSocket

{
"eventType": "stop",
"metadata": {
"accountId": "5555555",
"callId": "c-2a913f94-7fa91773-a426-4118-8b8b-b691ab0a0ae1",
"streamId": "s-2a913f94-93e372e2-60da-4c89-beb0-0d3a219b287c",
"streamName": "live_audience",
"tracks": [
{
"name": "inbound",
"mediaFormat": {
"encoding": "PCMU",
"sampleRate": 8000
}
},
{
"name": "outbound",
"mediaFormat": {
"encoding": "PCMU",
"sampleRate": 8000
}
}
]
}
}

A playAudio Websocket Message that could be sent to the Bandwidth API over the WebSocket

{
"eventType": "playAudio",
"media": {
"contentType": "audio/pcm",
"payload": "3Ob2dV1NRUpSXfTy69bHvbzD09PL0trpaWZMTV5PT05DRUpNYeLyb+jc1tPW3tfN1/r4cFZd5PxXXGjo2M/M0NTU0Nvi31ZFTFhLQERKT19safHd18zIycjHyc3Z4+7s609GSktMS1hmVFBm3eZk2tB4ffJ17/5r5dLb5uLd1c3UdmZnc/jt3eH9a3H06dvV3WNPYXxjS0BJT05VXm53+A=="
}
}

Example clear event message that could be sent to the Bandwidth API over the WebSocket

{
"eventType": "clear"
}