Start Stream
The StartStream verb allows a segment of a call to be sent off to another destination for additional processing.
When used on a call, audio from one or both sides (tracks) of the call will be sent to the specified destination. The stream will continue until the call ends or the <StopStream> verb is used. A total of 4 concurrent track streams are allowed on a call. A <StartStream> request that uses both tracks will count as 2 concurrent streams.
A call has only two tracks, which are named after the direction of the media from the perspective of the Programmable Voice platform:
inbound: media received by Programmable Voice from the call executing the BXML;outbound: media sent by Programmable Voice to the call executing the BXML.
Note that this has no correlation to the direction of the call itself. For example, if either an inbound or outbound call is being streamed and executes a <SpeakSentence>, the inbound track will be the callee's audio and the outbound track will be the text-to-speech audio.
Execution Behavior
<StartStream> is non-blocking — BXML execution continues to the next verb immediately after the stream is established. The stream itself runs in the background for the duration of the call.
This means that if <StartStream> is the last verb in your BXML response, the call will have no further instructions to execute and will end, closing the WebSocket connection almost immediately after it was opened.
For unidirectional streams this is usually not a problem, because you typically have other verbs after <StartStream> that keep the call alive (e.g., <SpeakSentence>, <Gather>, <Bridge>, etc.) and the stream runs in the background alongside them.
For bidirectional streams, where your service needs to send audio back over the WebSocket, you must keep the call alive while the stream is active. See Bidirectional Streams below for details.
Stream Modes
<StartStream> supports two modes via the mode attribute:
-
unidirectional(default) — Audio flows one way: from the call to your WebSocket server. Your server can listen and process but cannot send audio back to the call. Best for real-time transcription, analytics, or logging. -
bidirectional— Audio flows both ways: from the call to your server, and your server can send audio back to the call via the same WebSocket. Best for AI-powered voice agents, real-time translation, or any use case where your service needs to "speak" into the call.
Text Content
There is no text content available to be set for the <StartStream> verb.
Attributes
| Attribute | Description |
|---|---|
| name | (optional) A name to refer to this stream by. Used when sending <StopStream>. If not provided, it will default to the generated stream id as sent in the Media Stream Started webhook. |
| mode | (optional) The mode to use for the stream. unidirectional or bidirectional. Specifies whether the audio being streamed over the WebSocket is bidirectional (the service can both read and write audio over the WebSocket) or unidirectional (one-way, read-only). Default is unidirectional. |
| tracks | (optional) The part of the call to send a stream from. inbound, outbound or both. Default is inbound. |
| destination | (required) A websocket URI to send the stream to. The audio from the specified tracks will be sent via websocket to this URL as base64-encoded PCMU/G711 audio. See below for more details on the websocket packet format. |
| destinationUsername | (optional) The username to send in the Authorization header of the initial websocket connection to the destination URL. |
| destinationPassword | (optional) The password to send in the Authorization header of the initial websocket connection to the destination URL. |
| streamEventUrl | (optional) URL to send the associated Webhook events to during this stream's lifetime. Does not accept BXML. May be a relative URL. |
| streamEventMethod | (optional) The HTTP method to use for the request to streamEventUrl. GET or POST. Default value is POST. |
| username | (optional) The username to send in the HTTP request to streamEventUrl. If specified, the URLs must be TLS-encrypted (i.e., https). |
| password | (optional) The password to send in the HTTP request to streamEventUrl. If specified, the URLs must be TLS-encrypted (i.e., https). |
If the streamEventUrl attribute is specified, then the Media Stream Started, Media Stream Rejected and Media Stream Stopped events will be sent to the URL when the stream starts, if there is an error starting the stream and when the stream ends respectively. BXML returned in response to this callback will be ignored.
While multiple streams for the same call are allowed, each stream MUST have a unique name. Attempting to start a stream on the same call with the name of an already existing stream will result in a Media Stream Rejected event.
Webhooks Received
| Webhooks | Can reply with more BXML |
|---|---|
| Media Stream Started | No |
| Media Stream Rejected | No |
| Media Stream Stopped | No |
Nested Tags
You may specify up to 12 <StreamParam/> elements nested within a <StartStream> tag. These elements define optional user specified parameters that will be sent to the destination URL when the stream is first started.
StreamParam Attributes
| Attribute | Description |
|---|---|
| name | (required) The name of this parameter, up to 256 characters. |
| value | (required) The value of this parameter, up to 2048 characters. |
Unidirectional Streams
Unidirectional mode (mode="unidirectional", the default) sends a read-only audio stream from the call to your WebSocket server. Your server receives audio but cannot send audio back.
Since <StartStream> is non-blocking, the stream runs in the background while the call continues executing subsequent BXML verbs. This makes unidirectional streams straightforward — just place <StartStream> before whatever verbs should be streamed.
Bidirectional Streams
Bidirectional mode (mode="bidirectional") opens a two-way audio stream: your server receives audio from the call and can send audio back to the call over the same WebSocket connection.
Keeping the Call Alive
Because <StartStream> is non-blocking, you must ensure the call stays alive while your bidirectional stream is active. If the BXML execution runs out of verbs, the call ends and the WebSocket closes.
There are two approaches:
<StopStream> with wait="true" (recommended) — Place a <StopStream name="..." wait="true"/> after your <StartStream>, where the name matches the one given to <StartStream>. This holds the call open until the WebSocket connection is closed (either by your server or by a subsequent BXML response). Your server has full control over when the stream ends.
<?xml version="1.0" encoding="UTF-8"?>
<Response>
<StartStream name="ai_agent" mode="bidirectional" destination="wss://ai-agent.myapp.example.com" streamEventUrl="https://myapp.example.com/events">
<StreamParam name="call_context" value="support_queue" />
</StartStream>
<StopStream name="ai_agent" wait="true"/>
</Response>
<Pause> (alternative) — Place a <Pause length="..."/> after <StartStream> to keep the call alive for a fixed duration. This is less elegant but caps the maximum connection time, which can be useful as a cost protection mechanism if you want to ensure streams don't run indefinitely.
<?xml version="1.0" encoding="UTF-8"?>
<Response>
<StartStream name="ai_agent" mode="bidirectional" destination="wss://ai-agent.myapp.example.com"/>
<!-- Keep the call alive for up to 10 minutes -->
<Pause length="600"/>
</Response>
If using <Pause>, the stream will end when the pause duration expires regardless of whether your server is still processing. Use <StopStream wait="true"/> if your service needs to control the stream duration dynamically.
Sending Audio to the Call
In bidirectional mode, your WebSocket server can send JSON messages back over the connection to play audio into the call or clear buffered audio.
Play Audio Event
Send a playAudio event to play audio into the call:
{
"eventType": "playAudio",
"media": {
"contentType": "audio/pcmu",
"payload": "<base64-encoded-audio>"
}
}
The media.contentType field describes the format of the audio. Supported values:
audio/pcmu(8-bit, 8kHz, mono, μ-law format)audio/pcm(supportsrate=8000,rate=16000,rate=24000;channels=1;bit-depth=16;endian=little;encoding=signed)
Any other combination or unsupported parameters will be rejected.
Example content type values:
audio/pcmuaudio/pcmaudio/pcm;rate=8000audio/pcm;rate=16000audio/pcm;rate=24000audio/pcm;rate=16000;channels=1;bit-depth=16;endian=little;encoding=signed
If audio/pcm is sent, it will be automatically resampled to 8kHz μ-law (audio/pcmu) for downstream processing. Only mono (single-channel), 16-bit, little-endian, signed PCM is accepted for PCM input.
When audio/pcm is used without additional parameters, the defaults are rate=8000, channels=1, bit-depth=16, endian=little, encoding=signed.
If possible, it is recommended to use audio/pcmu for the best performance and compatibility. The audio data will not be resampled or re-encoded, and will be sent directly to the destination as-is.
Clear Event
All audio sent to the server will be buffered until it is transmitted. You can send a clear event to discard any untransmitted audio currently buffered. This is useful for barge-in scenarios where the caller interrupts and you want to stop the current playback immediately.
{
"eventType": "clear"
}
Behavior:
- All buffered, untransmitted audio bytes will be skipped.
- New audio sent after the
clearevent will be processed as usual. - No error will occur if the buffer is already empty.
Websocket Packet Format
At the destination end, the websocket will receive messages containing JSON for the duration of the stream. There will be an initial start message when the connection is first established. This will be followed by zero or more media messages containing the encoded audio for the tracks being streamed. Each media message includes a per-track sequenceNumber (as a string) that starts at "1" and increments by 1 for each subsequent message on that track. Finally, when a stream is stopped, a stop message will be sent.
Start and Stop Message Parameters
| Parameter | Description |
|---|---|
| eventType | What type of message this is, one of start, or stop |
| metadata | Details about the stream this message is for. See further details below. |
| streamParams | (optional) (start message only) If any <StreamParam/> elements were specified in the <StartStream> request, they will be copied here as a map of name : value pairs |
Metadata Parameters
| Parameter | Description |
|---|---|
| accountId | The user account associated with the call |
| callId | The call id associated with the stream |
| streamId | The unique id of the stream |
| streamName | The user supplied name of the stream |
| tracks | A list of one or more tracks being sent in this stream |
| tracks.name | The name of the track being sent, will be used to identify which media messages belong to which track |
| tracks.mediaFormat | The format the media will take for this track |
| tracks.mediaFormat.encoding | The encoding of the media for this track; currently only audio/PCMU is supported |
| tracks.mediaFormat.sampleRate | The sample rate of the media for this track, currently only 8000 is supported |
Media Message Parameters
| Parameter | Description |
|---|---|
| eventType | Will always be media |
| track | The name of the track this media packet is for, will be one of the names specified in the start message |
| payload | A base64 encoded string of actual media. The encoding of the media itself is as specified in the start message |
| sequenceNumber | A string containing a monotonically increasing sequence number for this track. The first media message on a track has the value "1" and increments by 1. |
Examples
Stream Both Legs of A Call
- XML
- Java
- C#
- Ruby
- NodeJS
- Python
- PHP
<?xml version="1.0" encoding="UTF-8"?>
<Response>
<SpeakSentence voice="bridget">This call is being streamed to a live studio audience.</SpeakSentence>
<StartStream name="live_audience" tracks="both" destination="wss://live-studio-audience.myapp.example.com" streamEventUrl="https://myapp.example.com/noBXML">
<StreamParam name="internal_id" value="call_ABC" />
</StartStream>
<SpeakSentence voice="bridget">This will now be streamed to the destination as well as played to the call participants.</SpeakSentence>
</Response>
SpeakSentence speakSentenceStart = new SpeakSentence("This call is being streamed to a live studio audience.").builder()
.voice(TtsVoice.BRIDGET)
.build();
StreamParam streamParam = new StreamParam().builder()
.name("internal_id")
.value("call_ABC")
.build();
StartStream startStream = new StartStream().builder()
.name("live_audience")
.tracks("both")
.destination("wss://live-studio-audience.myapp.example.com")
.streamEventUrl("https://myapp.example.com/noBXML")
.streamParams(List.of(streamParam))
.build();
SpeakSentence speakSentenceEnd = new SpeakSentence("This will now be streamed to the destination as well as played to the call participants.").builder()
.voice(TtsVoice.BRIDGET)
.build();
Response response = new Response()
.withVerbs(speakSentenceStart, startStream, speakSentenceEnd);
System.out.println(response.toBXML());
SpeakSentence speakSentenceStart = new SpeakSentence
{
Sentence = "This call is being streamed to a live studio audience.",
Voice = "bridget"
};
StreamParam streamParam = new StreamParam
{
Name = "internal_id",
Value = "call_ABC",
};
StartStream startStream = new StartStream
{
Name = "live_audience",
Tracks = "both",
Destination = "wss://live-studio-audience.myapp.example.com",
StreamEventUrl = "https://myapp.example.com/noBXML",
StreamParams = new StreamParams[] { streamParam }
};
SpeakSentence speakSentenceEnd = new SpeakSentence
{
Sentence = "This will now be streamed to the destination as well as played to the call participants.",
Voice = "bridget"
};
Response response = new Response();
response.Add(speakSentenceStart);
response.Add(startStream);
response.Add(speakSentenceEnd);
Console.WriteLine(response.ToBXML());
speak_sentence_start = Bandwidth::Bxml::SpeakSentence.new('This call is being streamed to a live studio audience.', {
voice: 'bridget'
})
stream_param = Bandwidth::Bxml::StreamParam.new({
name: 'internal_id',
value: 'call_ABC'
})
start_stream = Bandwidth::Bxml::StartStream.new([stream_param], {
name: 'live_audience',
tracks: 'both',
destination: 'wss://live-studio-audience.myapp.example.com',
stream_events_url: 'ttps://myapp.example.com/noBXML'
})
speak_sentence_end = Bandwidth::Bxml::SpeakSentence.new('This will now be streamed to the destination as well as played to the call participants.', {
voice: 'bridget'
})
response = Bandwidth::Bxml::Response.new([speak_sentence_start, start_stream, speak_sentence_end])
p response.to_bxml
const speakSentenceStart = new Bxml.SpeakSentence(
'This call is being streamed to a live studio audience.',
{
voice: 'bridget'
}
);
const streamParam = new Bxml.StreamParam({
name: 'internal_id',
value: 'call_ABC'
});
const startStream = new Bxml.StartStream(
{
name: 'live_audience',
tracks: 'both',
destination: 'wss://live-studio-audience.myapp.example.com',
streamEventUrl: 'https://myapp.example.com/noBXML'
},
[streamParam]
);
const speakSentenceEnd = new Bxml.SpeakSentence(
'This will now be streamed to the destination as well as played to the call participants.',
{
voice: 'bridget'
}
);
const response = new Bxml.Response([
speakSentenceStart,
startStream,
speakSentenceEnd
]);
console.log(response.toBxml());
speak_sentence_start = SpeakSentence(
text="This call is being streamed to a live studio audience.",
voice="bridget"
)
stream_param = StreamParam(
name="internal_id",
value="call_ABC"
)
start_stream = StartStream(
name="live_audience",
tracks="both",
destination="wss://live-studio-audience.myapp.example.com",
stream_events_url="https://myapp.example.com/noBXML",
stream_params=[stream_param]
)
speak_sentence_end = SpeakSentence(
text="This will now be streamed to the destination as well as played to the call participants.",
voice="bridget"
)
response = Response()
response.add_verb(speak_sentence_start)
response.add_verb(start_stream)
response.add_verb(speak_sentence_end)
print(response.to_bxml())
$speakSentenceStart = new BandwidthLib\Voice\Bxml\SpeakSentence("This call is being recorded. Please wait while we transfer you.");
$speakSentenceStart->voice("bridget");
$startRecording = new BandwidthLib\Voice\Bxml\StartRecording();
$startRecording->recordingAvailableUrl("https://myapp.com/noBXML");
$streamParam = new BandwidthLib\Voice\Bxml\StreamParam();
$streamParam->name("internal_id");
$streamParam->value("call_ABC");
$startStream = new BandwidthLib\Voice\Bxml\StartStream("wss://live-studio-audience.myapp.example.com");
$startStream->name("live_audience");
$startStream->tracks("both")
$startStream->streamEventUrl("https://myapp.example.com/noBXML")
$startStream->streamParams(array($streamParam));
$speakSentenceEnd = new BandwidthLib\Voice\Bxml\SpeakSentence("This will now be streamed to the destination as well as played to the call participants.");
$speakSentenceEnd->voice("bridget");
$response = new BandwidthLib\Voice\Bxml\Response();
$response->addVerb($speakSentenceStart);
$response->addVerb($startStream);
$response->addVerb($speakSentenceEnd);
echo $response->toBxml();
Bidirectional Stream with AI Agent
<?xml version="1.0" encoding="UTF-8"?>
<Response>
<SpeakSentence voice="bridget">Please hold while I connect you to our AI assistant.</SpeakSentence>
<StartStream name="ai_agent" mode="bidirectional" destination="wss://ai-agent.myapp.example.com" streamEventUrl="https://myapp.example.com/events">
<StreamParam name="caller_id" value="+15551234567" />
</StartStream>
<StopStream name="ai_agent" wait="true"/>
</Response>
In this example, <StopStream wait="true"/> keeps the call alive until the WebSocket connection is closed by the AI agent service. The AI agent can send audio back to the caller via playAudio events over the WebSocket.
A start Websocket Message sent by the Bandwidth API over the WebSocket
{
"eventType": "start",
"metadata": {
"accountId": "5555555",
"callId": "c-2a913f94-7fa91773-a426-4118-8b8b-b691ab0a0ae1",
"streamId": "s-2a913f94-93e372e2-60da-4c89-beb0-0d3a219b287c",
"streamName": "live_audience",
"tracks": [
{
"name": "inbound",
"mediaFormat": {
"encoding": "PCMU",
"sampleRate": 8000
}
},
{
"name": "outbound",
"mediaFormat": {
"encoding": "PCMU",
"sampleRate": 8000
}
}
]
},
"streamParams": {
"foo": "bar",
"foos": "bars"
}
}
A media Websocket Message sent by the Bandwidth API over the WebSocket
{
"eventType": "media",
"track": "inbound",
"payload": "3Ob2dV1NRUpSXfTy69bHvbzD09PL0trpaWZMTV5PT05DRUpNYeLyb+jc1tPW3tfN1/r4cFZd5PxXXGjo2M/M0NTU0Nvi31ZFTFhLQERKT19safHd18zIycjHyc3Z4+7s609GSktMS1hmVFBm3eZk2tB4ffJ17/5r5dLb5uLd1c3UdmZnc/jt3eH9a3H06dvV3WNPYXxjS0BJT05VXm53+A==",
"sequenceNumber": "1"
}
A stop Websocket Message sent by the Bandwidth API over the WebSocket
{
"eventType": "stop",
"metadata": {
"accountId": "5555555",
"callId": "c-2a913f94-7fa91773-a426-4118-8b8b-b691ab0a0ae1",
"streamId": "s-2a913f94-93e372e2-60da-4c89-beb0-0d3a219b287c",
"streamName": "live_audience",
"tracks": [
{
"name": "inbound",
"mediaFormat": {
"encoding": "PCMU",
"sampleRate": 8000
}
},
{
"name": "outbound",
"mediaFormat": {
"encoding": "PCMU",
"sampleRate": 8000
}
}
]
}
}
A playAudio Websocket Message that could be sent to the Bandwidth API over the WebSocket
{
"eventType": "playAudio",
"media": {
"contentType": "audio/pcm",
"payload": "3Ob2dV1NRUpSXfTy69bHvbzD09PL0trpaWZMTV5PT05DRUpNYeLyb+jc1tPW3tfN1/r4cFZd5PxXXGjo2M/M0NTU0Nvi31ZFTFhLQERKT19safHd18zIycjHyc3Z4+7s609GSktMS1hmVFBm3eZk2tB4ffJ17/5r5dLb5uLd1c3UdmZnc/jt3eH9a3H06dvV3WNPYXxjS0BJT05VXm53+A=="
}
}
Example clear event message that could be sent to the Bandwidth API over the WebSocket
{
"eventType": "clear"
}