Start Stream

The StartStream verb allows a segment of a call to be sent off to another destination for additional processing.

When used on a call, audio from one or both sides (tracks) of the call will be sent to the specified destination. The stream will continue until the call ends or the <StopStream> verb is used. A total of 4 concurrent track streams are allowed on a call. A <StartStream> request that uses both tracks will count as 2 concurrent streams.

A call has only two tracks, which are named after the direction of the media from the perspective of the Programmable Voice platform:

inbound: media received by Programmable Voice from the call executing the BXML;
outbound: media sent by Programmable Voice to the call executing the BXML.

Note that this has no correlation to the direction of the call itself. For example, if either an inbound or outbound call is being streamed and executes a <SpeakSentence>, the inbound track will be the callee's audio and the outbound track will be the text-to-speech audio.

Execution Behavior

<StartStream> is non-blocking — BXML execution continues to the next verb immediately after the stream is established. The stream itself runs in the background for the duration of the call.

This means that if <StartStream> is the last verb in your BXML response, the call will have no further instructions to execute and will end, closing the WebSocket connection almost immediately after it was opened.

For unidirectional streams this is usually not a problem, because you typically have other verbs after <StartStream> that keep the call alive (e.g., <SpeakSentence>, <Gather>, <Bridge>, etc.) and the stream runs in the background alongside them.

For bidirectional streams, where your service needs to send audio back over the WebSocket, you must keep the call alive while the stream is active. See Bidirectional Streams below for details.

Stream Modes

<StartStream> supports two modes via the mode attribute:

unidirectional (default) — Audio flows one way: from the call to your WebSocket server. Your server can listen and process but cannot send audio back to the call. Best for real-time transcription, analytics, or logging.
bidirectional — Audio flows both ways: from the call to your server, and your server can send audio back to the call via the same WebSocket. Best for AI-powered voice agents, real-time translation, or any use case where your service needs to "speak" into the call.

Text Content

There is no text content available to be set for the <StartStream> verb.

Attributes

Attribute	Description
name	(optional) A name to refer to this stream by. Used when sending `<StopStream>`. If not provided, it will default to the generated stream id as sent in the `Media Stream Started` webhook.
mode	(optional) The mode to use for the stream. `unidirectional` or `bidirectional`. Specifies whether the audio being streamed over the WebSocket is bidirectional (the service can both read and write audio over the WebSocket) or unidirectional (one-way, read-only). Default is `unidirectional`.
tracks	(optional) The part of the call to send a stream from. `inbound`, `outbound` or `both`. Default is `inbound`.
destination	(required) A websocket URI to send the stream to. The audio from the specified tracks will be sent via websocket to this URL as base64-encoded PCMU/G711 audio. See below for more details on the websocket packet format.
destinationUsername	(optional) The username to send in the `Authorization` header of the initial websocket connection to the `destination` URL.
destinationPassword	(optional) The password to send in the `Authorization` header of the initial websocket connection to the `destination` URL.
streamEventUrl	(optional) URL to send the associated Webhook events to during this stream's lifetime. Does not accept BXML. May be a relative URL.
streamEventMethod	(optional) The HTTP method to use for the request to `streamEventUrl`. GET or POST. Default value is POST.
username	(optional) The username to send in the HTTP request to `streamEventUrl`. If specified, the URLs must be TLS-encrypted (i.e., `https`).
password	(optional) The password to send in the HTTP request to `streamEventUrl`. If specified, the URLs must be TLS-encrypted (i.e., `https`).

If the streamEventUrl attribute is specified, then the Media Stream Started, Media Stream Rejected and Media Stream Stopped events will be sent to the URL when the stream starts, if there is an error starting the stream and when the stream ends respectively. BXML returned in response to this callback will be ignored.

note

While multiple streams for the same call are allowed, each stream MUST have a unique name. Attempting to start a stream on the same call with the name of an already existing stream will result in a Media Stream Rejected event.

Webhooks Received

Webhooks	Can reply with more BXML
Media Stream Started	No
Media Stream Rejected	No
Media Stream Stopped	No

Nested Tags

You may specify up to 12 <StreamParam/> elements nested within a <StartStream> tag. These elements define optional user specified parameters that will be sent to the destination URL when the stream is first started.

StreamParam Attributes

Attribute	Description
name	(required) The name of this parameter, up to 256 characters.
value	(required) The value of this parameter, up to 2048 characters.

Unidirectional Streams

Unidirectional mode (mode="unidirectional", the default) sends a read-only audio stream from the call to your WebSocket server. Your server receives audio but cannot send audio back.

Since <StartStream> is non-blocking, the stream runs in the background while the call continues executing subsequent BXML verbs. This makes unidirectional streams straightforward — just place <StartStream> before whatever verbs should be streamed.

Bidirectional Streams

Bidirectional mode (mode="bidirectional") opens a two-way audio stream: your server receives audio from the call and can send audio back to the call over the same WebSocket connection.

Keeping the Call Alive

Because <StartStream> is non-blocking, you must ensure the call stays alive while your bidirectional stream is active. If the BXML execution runs out of verbs, the call ends and the WebSocket closes.

There are two approaches:

<StopStream> with wait="true" (recommended) — Place a <StopStream name="..." wait="true"/> after your <StartStream>, where the name matches the one given to <StartStream>. This holds the call open until the WebSocket connection is closed (either by your server or by a subsequent BXML response). Your server has full control over when the stream ends.

<?xml version="1.0" encoding="UTF-8"?>
<Response>
    <StartStream name="ai_agent" mode="bidirectional" destination="wss://ai-agent.myapp.example.com" streamEventUrl="https://myapp.example.com/events">
        <StreamParam name="call_context" value="support_queue" />
    </StartStream>
    <StopStream name="ai_agent" wait="true"/>
</Response>

<Pause> (alternative) — Place a <Pause length="..."/> after <StartStream> to keep the call alive for a fixed duration. This is less elegant but caps the maximum connection time, which can be useful as a cost protection mechanism if you want to ensure streams don't run indefinitely.

<?xml version="1.0" encoding="UTF-8"?>
<Response>
    <StartStream name="ai_agent" mode="bidirectional" destination="wss://ai-agent.myapp.example.com"/>
    <!-- Keep the call alive for up to 10 minutes -->
    <Pause length="600"/>
</Response>

caution

If using <Pause>, the stream will end when the pause duration expires regardless of whether your server is still processing. Use <StopStream wait="true"/> if your service needs to control the stream duration dynamically.

Sending Audio to the Call

In bidirectional mode, your WebSocket server can send JSON messages back over the connection to play audio into the call or clear buffered audio.

Play Audio Event

Send a playAudio event to play audio into the call:

{
    "eventType": "playAudio",
    "media": {
        "contentType": "audio/pcmu",
        "payload": "<base64-encoded-audio>"
    }
}

The media.contentType field describes the format of the audio. Supported values:

audio/pcmu (8-bit, 8kHz, mono, μ-law format)
audio/pcm (supports rate=8000, rate=16000, rate=24000; channels=1; bit-depth=16; endian=little; encoding=signed)

Any other combination or unsupported parameters will be rejected.

Example content type values:

audio/pcmu
audio/pcm
audio/pcm;rate=8000
audio/pcm;rate=16000
audio/pcm;rate=24000
audio/pcm;rate=16000;channels=1;bit-depth=16;endian=little;encoding=signed

If audio/pcm is sent, it will be automatically resampled to 8kHz μ-law (audio/pcmu) for downstream processing. Only mono (single-channel), 16-bit, little-endian, signed PCM is accepted for PCM input.

When audio/pcm is used without additional parameters, the defaults are rate=8000, channels=1, bit-depth=16, endian=little, encoding=signed.

note

If possible, it is recommended to use audio/pcmu for the best performance and compatibility. The audio data will not be resampled or re-encoded, and will be sent directly to the destination as-is.

Clear Event

All audio sent to the server will be buffered until it is transmitted. You can send a clear event to discard any untransmitted audio currently buffered. This is useful for barge-in scenarios where the caller interrupts and you want to stop the current playback immediately.

{
    "eventType": "clear"
}

Behavior:

All buffered, untransmitted audio bytes will be skipped.
New audio sent after the clear event will be processed as usual.
No error will occur if the buffer is already empty.

Websocket Packet Format

At the destination end, the websocket will receive messages containing JSON for the duration of the stream. There will be an initial start message when the connection is first established. This will be followed by zero or more media messages containing the encoded audio for the tracks being streamed. Each media message includes a per-track sequenceNumber (as a string) that starts at "1" and increments by 1 for each subsequent message on that track. Finally, when a stream is stopped, a stop message will be sent.

Start and Stop Message Parameters

Parameter	Description
eventType	What type of message this is, one of `start`, or `stop`
metadata	Details about the stream this message is for. See further details below.
streamParams	(optional) (`start` message only) If any `<StreamParam/>` elements were specified in the `<StartStream>` request, they will be copied here as a map of `name : value` pairs

Metadata Parameters

Parameter	Description
accountId	The user account associated with the call
callId	The call id associated with the stream
streamId	The unique id of the stream
streamName	The user supplied name of the stream
tracks	A list of one or more tracks being sent in this stream
tracks.name	The name of the track being sent, will be used to identify which media messages belong to which track
tracks.mediaFormat	The format the media will take for this track
tracks.mediaFormat.encoding	The encoding of the media for this track; currently only `audio/PCMU` is supported
tracks.mediaFormat.sampleRate	The sample rate of the media for this track, currently only `8000` is supported

Media Message Parameters

Parameter	Description
eventType	Will always be `media`
track	The name of the track this media packet is for, will be one of the names specified in the `start` message
payload	A base64 encoded string of actual media. The encoding of the media itself is as specified in the `start` message
sequenceNumber	A string containing a monotonically increasing sequence number for this track. The first `media` message on a track has the value "1" and increments by 1.

Examples

Stream Both Legs of A Call

<?xml version="1.0" encoding="UTF-8"?>
<Response>
    <SpeakSentence voice="bridget">This call is being streamed to a live studio audience.</SpeakSentence>
    <StartStream name="live_audience" tracks="both" destination="wss://live-studio-audience.myapp.example.com" streamEventUrl="https://myapp.example.com/noBXML">
        <StreamParam name="internal_id" value="call_ABC" />
    </StartStream>
    <SpeakSentence voice="bridget">This will now be streamed to the destination as well as played to the call participants.</SpeakSentence>
</Response>

SpeakSentence speakSentenceStart = new SpeakSentence("This call is being streamed to a live studio audience.").builder()
        .voice(TtsVoice.BRIDGET)
        .build();

StreamParam streamParam = new StreamParam().builder()
        .name("internal_id")
        .value("call_ABC")
        .build();

StartStream startStream = new StartStream().builder()
        .name("live_audience")
        .tracks("both")
        .destination("wss://live-studio-audience.myapp.example.com")
        .streamEventUrl("https://myapp.example.com/noBXML")
        .streamParams(List.of(streamParam))
        .build();

SpeakSentence speakSentenceEnd = new SpeakSentence("This will now be streamed to the destination as well as played to the call participants.").builder()
        .voice(TtsVoice.BRIDGET)
        .build();

Response response = new Response()
        .withVerbs(speakSentenceStart, startStream, speakSentenceEnd);

System.out.println(response.toBXML());

SpeakSentence speakSentenceStart = new SpeakSentence
{
    Sentence = "This call is being streamed to a live studio audience.",
    Voice = "bridget"
};

StreamParam streamParam = new StreamParam
{
    Name = "internal_id",
    Value = "call_ABC",
};

StartStream startStream = new StartStream
{
    Name = "live_audience",
    Tracks = "both",
    Destination = "wss://live-studio-audience.myapp.example.com",
    StreamEventUrl = "https://myapp.example.com/noBXML",
    StreamParams = new StreamParams[] { streamParam }
};

SpeakSentence speakSentenceEnd = new SpeakSentence
{
    Sentence = "This will now be streamed to the destination as well as played to the call participants.",
    Voice = "bridget"
};

Response response = new Response();
response.Add(speakSentenceStart);
response.Add(startStream);
response.Add(speakSentenceEnd);

Console.WriteLine(response.ToBXML());

speak_sentence_start = Bandwidth::Bxml::SpeakSentence.new('This call is being streamed to a live studio audience.', {
  voice: 'bridget'
})
stream_param = Bandwidth::Bxml::StreamParam.new({
  name: 'internal_id',
  value: 'call_ABC'
})
start_stream = Bandwidth::Bxml::StartStream.new([stream_param], {
  name: 'live_audience',
  tracks: 'both',
  destination: 'wss://live-studio-audience.myapp.example.com',
  stream_events_url: 'ttps://myapp.example.com/noBXML'
})
speak_sentence_end = Bandwidth::Bxml::SpeakSentence.new('This will now be streamed to the destination as well as played to the call participants.', {
  voice: 'bridget'
})
response = Bandwidth::Bxml::Response.new([speak_sentence_start, start_stream, speak_sentence_end])

p response.to_bxml

const speakSentenceStart = new Bxml.SpeakSentence(
    'This call is being streamed to a live studio audience.',
    {
        voice: 'bridget'
    }
);
const streamParam = new Bxml.StreamParam({
    name: 'internal_id',
    value: 'call_ABC'
});
const startStream = new Bxml.StartStream(
    {
        name: 'live_audience',
        tracks: 'both',
        destination: 'wss://live-studio-audience.myapp.example.com',
        streamEventUrl: 'https://myapp.example.com/noBXML'
    },
    [streamParam]
);
const speakSentenceEnd = new Bxml.SpeakSentence(
    'This will now be streamed to the destination as well as played to the call participants.',
    {
        voice: 'bridget'
    }
);
const response = new Bxml.Response([
    speakSentenceStart,
    startStream,
    speakSentenceEnd
]);

console.log(response.toBxml());

speak_sentence_start = SpeakSentence(
    text="This call is being streamed to a live studio audience.",
    voice="bridget"
)

stream_param = StreamParam(
    name="internal_id",
    value="call_ABC"
)

start_stream = StartStream(
    name="live_audience",
    tracks="both",
    destination="wss://live-studio-audience.myapp.example.com",
    stream_events_url="https://myapp.example.com/noBXML",
    stream_params=[stream_param]
)

speak_sentence_end = SpeakSentence(
    text="This will now be streamed to the destination as well as played to the call participants.",
    voice="bridget"
)

response = Response()
response.add_verb(speak_sentence_start)
response.add_verb(start_stream)
response.add_verb(speak_sentence_end)

print(response.to_bxml())

$speakSentenceStart = new BandwidthLib\Voice\Bxml\SpeakSentence("This call is being recorded. Please wait while we transfer you.");
$speakSentenceStart->voice("bridget");

$startRecording = new BandwidthLib\Voice\Bxml\StartRecording();
$startRecording->recordingAvailableUrl("https://myapp.com/noBXML");

$streamParam = new BandwidthLib\Voice\Bxml\StreamParam();
$streamParam->name("internal_id");
$streamParam->value("call_ABC");

$startStream = new BandwidthLib\Voice\Bxml\StartStream("wss://live-studio-audience.myapp.example.com");
$startStream->name("live_audience");
$startStream->tracks("both")
$startStream->streamEventUrl("https://myapp.example.com/noBXML")
$startStream->streamParams(array($streamParam));

$speakSentenceEnd = new BandwidthLib\Voice\Bxml\SpeakSentence("This will now be streamed to the destination as well as played to the call participants.");
$speakSentenceEnd->voice("bridget");

$response = new BandwidthLib\Voice\Bxml\Response();
$response->addVerb($speakSentenceStart);
$response->addVerb($startStream);
$response->addVerb($speakSentenceEnd);

echo $response->toBxml();

Bidirectional Stream with AI Agent

<?xml version="1.0" encoding="UTF-8"?>
<Response>
    <SpeakSentence voice="bridget">Please hold while I connect you to our AI assistant.</SpeakSentence>
    <StartStream name="ai_agent" mode="bidirectional" destination="wss://ai-agent.myapp.example.com" streamEventUrl="https://myapp.example.com/events">
        <StreamParam name="caller_id" value="+15551234567" />
    </StartStream>
    <StopStream name="ai_agent" wait="true"/>
</Response>

In this example, <StopStream wait="true"/> keeps the call alive until the WebSocket connection is closed by the AI agent service. The AI agent can send audio back to the caller via playAudio events over the WebSocket.

A `start` Websocket Message sent by the Bandwidth API over the WebSocket

{
    "eventType": "start",
    "metadata": {
        "accountId": "5555555",
        "callId": "c-2a913f94-7fa91773-a426-4118-8b8b-b691ab0a0ae1",
        "streamId": "s-2a913f94-93e372e2-60da-4c89-beb0-0d3a219b287c",
        "streamName": "live_audience",
        "tracks": [
            {
                "name": "inbound",
                "mediaFormat": {
                    "encoding": "PCMU",
                    "sampleRate": 8000
                }
            },
            {
                "name": "outbound",
                "mediaFormat": {
                    "encoding": "PCMU",
                    "sampleRate": 8000
                }
            }
        ]
    },
    "streamParams": {
        "foo": "bar",
        "foos": "bars"
    }
}

A `media` Websocket Message sent by the Bandwidth API over the WebSocket

{
    "eventType": "media",
    "track": "inbound",
    "payload": "3Ob2dV1NRUpSXfTy69bHvbzD09PL0trpaWZMTV5PT05DRUpNYeLyb+jc1tPW3tfN1/r4cFZd5PxXXGjo2M/M0NTU0Nvi31ZFTFhLQERKT19safHd18zIycjHyc3Z4+7s609GSktMS1hmVFBm3eZk2tB4ffJ17/5r5dLb5uLd1c3UdmZnc/jt3eH9a3H06dvV3WNPYXxjS0BJT05VXm53+A==",
    "sequenceNumber": "1"
}

A `stop` Websocket Message sent by the Bandwidth API over the WebSocket

{
    "eventType": "stop",
    "metadata": {
        "accountId": "5555555",
        "callId": "c-2a913f94-7fa91773-a426-4118-8b8b-b691ab0a0ae1",
        "streamId": "s-2a913f94-93e372e2-60da-4c89-beb0-0d3a219b287c",
        "streamName": "live_audience",
        "tracks": [
            {
                "name": "inbound",
                "mediaFormat": {
                    "encoding": "PCMU",
                    "sampleRate": 8000
                }
            },
            {
                "name": "outbound",
                "mediaFormat": {
                    "encoding": "PCMU",
                    "sampleRate": 8000
                }
            }
        ]
    }
}

A `playAudio` Websocket Message that could be sent to the Bandwidth API over the WebSocket

{
    "eventType": "playAudio",
    "media": {
        "contentType": "audio/pcm",
        "payload": "3Ob2dV1NRUpSXfTy69bHvbzD09PL0trpaWZMTV5PT05DRUpNYeLyb+jc1tPW3tfN1/r4cFZd5PxXXGjo2M/M0NTU0Nvi31ZFTFhLQERKT19safHd18zIycjHyc3Z4+7s609GSktMS1hmVFBm3eZk2tB4ffJ17/5r5dLb5uLd1c3UdmZnc/jt3eH9a3H06dvV3WNPYXxjS0BJT05VXm53+A=="
    }
}

Example clear event message that could be sent to the Bandwidth API over the WebSocket

{
    "eventType": "clear"
}

Execution Behavior​

Stream Modes​

Text Content​

Attributes​

Webhooks Received​

Nested Tags​

StreamParam Attributes​

Unidirectional Streams​

Bidirectional Streams​

Keeping the Call Alive​

Sending Audio to the Call​

Play Audio Event​

Clear Event​

Websocket Packet Format​

Start and Stop Message Parameters​

Metadata Parameters​

Media Message Parameters​

Examples​

Stream Both Legs of A Call​

Bidirectional Stream with AI Agent​

A start Websocket Message sent by the Bandwidth API over the WebSocket​

A media Websocket Message sent by the Bandwidth API over the WebSocket​

A stop Websocket Message sent by the Bandwidth API over the WebSocket​

A playAudio Websocket Message that could be sent to the Bandwidth API over the WebSocket​

Example clear event message that could be sent to the Bandwidth API over the WebSocket​