Pipecat Integration
This guide will walk you through building a real-time voice AI agent on Bandwidth's Voice Network using Pipecat, the open-source Python framework for voice AI pipelines. Pipecat orchestrates the STT → LLM → TTS pipeline; Bandwidth provides the carrier-grade telephony layer that brings real PSTN callers into it.
The integration is published as the pipecat-bandwidth library, which provides a BandwidthFrameSerializer that plugs directly into Pipecat's FastAPIWebsocketTransport. You can swap STT, LLM, and TTS providers (Deepgram, OpenAI, Cartesia, ElevenLabs, etc.) without touching the telephony layer.
This integration uses Bandwidth Programmable Voice with Media Streaming. Your application receives μ-law audio frames directly over a bidirectional WebSocket and feeds them into a Pipecat pipeline — no SIP-to-SIP routing required.
What you'll need
- A Bandwidth Programmable Voice account with:
- A purchased phone number assigned to a Voice Application
- API credentials (OAuth 2.0 client ID + secret)
- Don't have an account yet? Try Bandwidth Build for free — get a real phone number and 3000 credits to start building immediately.
- If you have a full Bandwidth App account but haven't set it up yet, check out our Account Setup guide.
- Python 3.11+
- uv (or
pipif you prefer) - A publicly accessible URL for your application (e.g., using ngrok)
- API keys for the AI services in your pipeline. The example uses:
- The reference example:
Bandwidth/pipecat-bandwidth/examples/bandwidth-chatbot
Call Flow
Before we dive in, let's walk through what an inbound call flow looks like with this integration.
This flow demonstrates a basic AI agent answering an inbound call. Let's break it down:
- A user calls your Bandwidth number.
- Bandwidth POSTs a Basic-Auth-protected webhook to your application with the inbound call event, including
callIdandaccountId. - Your application mints a one-time correlation token bound to those server-trusted IDs and responds with a
<StartStream>BXML pointing atwss://<your-host>/ws/{token}. - Bandwidth opens a WebSocket to your application. Your app validates the token, recovers the trusted
callId/accountId, and constructs aBandwidthFrameSerializer. - The serializer decodes Bandwidth's μ-law audio into Pipecat audio frames. The pipeline transcribes, generates a response, synthesizes speech, and streams it back over the same WebSocket.
- The caller hears the AI agent. Interruptions emit a
clearevent so the bot stops talking immediately when the caller speaks. - When the pipeline ends, the serializer auto-hangs-up via the Bandwidth Voice API using the trusted
callId.
The BandwidthFrameSerializer's auto-hang-up path uses your operator OAuth credentials to terminate the call by callId. If that ID came from an unauthenticated WebSocket frame, anyone who can reach your /ws endpoint could feed an arbitrary callId and trigger a hang-up against a live call in your account.
The fix is to trust only the authenticated webhook body for callId and accountId and bind them to the WebSocket connection via a server-issued correlation token — exactly what the example below does. Never derive these IDs from the WebSocket start event metadata.
Let's Build It!
For convenience, we provide a complete sample application: Bandwidth/pipecat-bandwidth/examples/bandwidth-chatbot.
It's a single-file FastAPI server that:
- Returns a
<StartStream>BXML response on Bandwidth's voice webhook (Basic Auth). - Accepts the bidirectional WebSocket and validates a one-time correlation token.
- Runs a Deepgram (STT) → OpenAI (LLM) → Cartesia (TTS) Pipecat pipeline with Silero VAD for turn detection.
The following sections walk through the example to help you understand how it works.
Setup our Environment
Clone the repository and move into the example directory:
git clone https://github.com/Bandwidth/pipecat-bandwidth
cd pipecat-bandwidth/examples/bandwidth-chatbot
uv sync
cp env.example .env
Fill in your .env:
# Public hostname for the BXML StartStream destination (no scheme)
PROXY_HOST=
# Optional: change the local port the bot listens on
PORT=7860
# Bandwidth API credentials (OAuth 2.0 client_credentials)
# Required for auto hang-up on EndFrame / CancelFrame.
BANDWIDTH_CLIENT_ID=
BANDWIDTH_CLIENT_SECRET=
# Webhook Basic Auth. Set the same username/password in your Bandwidth
# voice application's webhook configuration.
BANDWIDTH_WEBHOOK_USERNAME=
BANDWIDTH_WEBHOOK_PASSWORD=
# AI service keys
OPENAI_API_KEY=
DEEPGRAM_API_KEY=
CARTESIA_API_KEY=
Run the bot:
uv run python bot.py
A successful startup will log something like:
INFO: Uvicorn running on http://0.0.0.0:7860 (Press CTRL+C to quit)
INFO: Application startup complete.
Handle the Inbound Voice Webhook
When a user calls your Bandwidth number, Bandwidth POSTs to your inbound voice webhook. The handler authenticates the request, extracts the trusted callId / accountId from the body, mints a one-time correlation token, and returns a <StartStream> BXML.
@app.post("/")
async def inbound_call(request: Request) -> HTMLResponse:
"""Bandwidth voice webhook. Basic Auth required; trust the body's IDs."""
_verify_webhook_auth(request)
body = await request.json()
call_id = body.get("callId")
account_id = body.get("accountId")
if not call_id or not account_id:
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail="Webhook body missing callId or accountId",
)
token = await _issue_token(str(call_id), str(account_id))
bxml = f"""<?xml version="1.0" encoding="UTF-8"?>
<Response>
<StartStream destination="wss://{PROXY_HOST}/ws/{token}" mode="bidirectional" tracks="inbound"/>
<Pause duration="86400"/>
</Response>"""
return HTMLResponse(content=bxml, media_type="application/xml")
A few things to note:
_verify_webhook_authenforces HTTP Basic Auth usingBANDWIDTH_WEBHOOK_USERNAME/BANDWIDTH_WEBHOOK_PASSWORD. The example refuses to start (returns503) if these are unset — silent acceptance would defeat the point of the trust chain.callIdandaccountIdare read only from the authenticated webhook body._issue_tokenmints a 32-byte URL-safe token with a short TTL (60 seconds) and stores(call_id, account_id, expires_at)server-side keyed by the token.- The BXML uses
<StartStream mode="bidirectional" tracks="inbound"/>to open a two-way audio WebSocket. The<Pause duration="86400"/>keeps the call leg alive while the WebSocket session runs.
Accept the WebSocket and Wire Up Pipecat
Bandwidth opens a WebSocket to /ws/{token}. The handler validates the token, recovers the trusted IDs, constructs a BandwidthFrameSerializer, and runs the Pipecat pipeline.
@app.websocket("/ws/{token}")
async def websocket_endpoint(websocket: WebSocket, token: str) -> None:
"""Validate the correlation token, then run the bot pipeline."""
trusted = await _consume_token(token)
if trusted is None:
await websocket.close(code=1008) # 1008 = policy violation
return
call_id, account_id = trusted
await websocket.accept()
# The first frame from Bandwidth is the "start" event. We use streamId
# from it (it's just a wire-protocol identifier). callId and accountId
# come from the trusted token mapping — we deliberately ignore whatever
# the WS metadata claims.
first = await websocket.receive_text()
start_event = json.loads(first)
stream_id = start_event.get("metadata", {}).get("streamId", "")
serializer = BandwidthFrameSerializer(
stream_id=stream_id,
call_id=call_id,
account_id=account_id,
client_id=os.getenv("BANDWIDTH_CLIENT_ID"),
client_secret=os.getenv("BANDWIDTH_CLIENT_SECRET"),
)
transport = FastAPIWebsocketTransport(
websocket=websocket,
params=FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
add_wav_header=False,
serializer=serializer,
),
)
await run_bot(transport)
Key points:
_consume_tokenpops the token from server-side state — single-use, expires in 60s. Invalid tokens get rejected with WS close code1008.- The Bandwidth
startevent'sstreamIdis just a wire-protocol identifier; using it is safe.callId/accountIdare not read from the WS frame. BandwidthFrameSerializerhandles the protocol details: μ-law decoding inbound, audio encoding outbound (μ-law by default, or PCM at 8/16/24 kHz), interruption signaling viaclear, and auto hang-up via the Voice API onEndFrame/CancelFrame.- The serializer plugs into a standard Pipecat
FastAPIWebsocketTransport— there's no Bandwidth-specific transport class to learn.
Build the Pipecat Pipeline
With the transport ready, the bot pipeline is plain Pipecat. Swap any of the services without touching the telephony layer.
async def run_bot(transport: FastAPIWebsocketTransport) -> None:
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction=(
"You are a helpful assistant on a phone call. Keep responses "
"concise and conversational. Avoid special characters since "
"your output is converted to audio."
),
),
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline([
transport.input(),
stt,
user_aggregator,
llm,
tts,
transport.output(),
assistant_aggregator,
])
task = PipelineTask(
pipeline,
params=PipelineParams(
audio_in_sample_rate=8000,
audio_out_sample_rate=8000,
enable_metrics=True,
enable_usage_metrics=True,
),
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
context.add_message({"role": "user", "content": "Please introduce yourself to the caller."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
await task.cancel()
runner = PipelineRunner(handle_sigint=False)
await runner.run(task)
Notable details:
- Silero VAD is wired into the user aggregator for turn detection.
- 8 kHz audio is the Bandwidth default for μ-law. For higher TTS quality, set
outbound_encoding="PCM"andoutbound_pcm_sample_rate=24000onBandwidthFrameSerializer.InputParams, and bumpaudio_out_sample_rateaccordingly. on_client_connectedis a clean hook to seed the conversation with a greeting prompt as soon as the WebSocket is fully connected.
Connect to a Public URL
Bandwidth needs a publicly reachable HTTPS host to deliver the inbound webhook and open the media-stream WebSocket. In a separate terminal:
ngrok http 7860
Copy the ngrok-free.app hostname (no scheme) into .env as PROXY_HOST and restart the bot so the BXML response references the correct destination.
Configure your Bandwidth Voice Application
Finally, point your Bandwidth Voice Application at your public webhook URL.
- Log in to the Bandwidth App and open your Voice Application (or create a new one).
- Under Call initiated, select POST as the callback method and set the Callback URL to
https://<your-public-host>/. - Tick Use a callback username and password and enter the same
BANDWIDTH_WEBHOOK_USERNAME/BANDWIDTH_WEBHOOK_PASSWORDvalues you put in.env. Bandwidth will send these in a Basic Auth header on every inbound call event.- The example bot enforces Basic Auth on the webhook so it can trust the body's
callId/accountId(the security note under Call Flow explains why). Bandwidth itself doesn't require this — if you'd rather verify Bandwidth's webhook signature or IP-allowlist their egress ranges instead, swap out_verify_webhook_authand skip this checkbox.
- The example bot enforces Basic Auth on the webhook so it can trust the body's
- Click Save.
- Make sure the application is linked to a Voice Configuration Package and that the package is assigned to your Bandwidth phone number.
Test the Integration
Call your Bandwidth phone number. You should hear the bot introduce itself and you can start a conversation. When you hang up — or the bot ends the call — the serializer terminates the call leg via the Bandwidth Voice API using the trusted callId.
Hand off to a Human Agent
Sometimes the AI agent isn't enough — the caller wants a person, the bot decides the conversation is out of scope, or you want a "press 0 for a human" escape hatch. Because the FastAPI app sits between Bandwidth and the AI services, handoff is straightforward: give the LLM a transfer_to_human tool, and when it fires, replace the active call's BXML with a <Transfer> verb via Bandwidth's Update Call BXML endpoint. Bandwidth tears down the media stream, dials the human, and bridges the legs — your Pipecat pipeline exits cleanly on its own.
Define the tool
Pipecat's OpenAILLMService accepts an OpenAI-style tools list in its Settings. Register a single function tool that the model can call when it decides the conversation should escalate:
TRANSFER_TOOL = {
"type": "function",
"function": {
"name": "transfer_to_human",
"description": (
"Transfer the caller to a live human agent. Call this when the "
"caller asks to speak to a person, says they want a human, or "
"when you cannot help them and an agent should take over."
),
"parameters": {
"type": "object",
"properties": {
"reason": {
"type": "string",
"description": "Short reason for the transfer, for logging.",
},
},
"required": ["reason"],
},
},
}
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction=(
"You are a helpful assistant on a phone call. Keep responses "
"concise and conversational. Avoid special characters since "
"your output is converted to audio. If the caller asks for a "
"human, call the transfer_to_human tool — don't promise a "
"transfer in text, actually call the tool."
),
tools=[TRANSFER_TOOL],
),
)
Handle the tool call
When the LLM calls transfer_to_human, your handler does the actual work — PUTing fresh BXML to Bandwidth that redirects the call leg into a <Transfer>. The OAuth credentials and trusted callId / accountId are already in scope from the WebSocket handler — pass them in when you register the handler against the LLMService:
HUMAN_AGENT_NUMBER = os.getenv("HUMAN_AGENT_NUMBER", "+19195554321")
OAUTH_TOKEN_URL = "https://api.bandwidth.com/api/v1/oauth2/token"
VOICE_API_BASE_URL = "https://voice.bandwidth.com/api/v2"
async def transfer_to_human(
params: FunctionCallParams,
*,
call_id: str,
account_id: str,
) -> None:
"""Replace the active call's BXML with a <Transfer> to a human agent."""
bxml = (
'<?xml version="1.0" encoding="UTF-8"?>'
"<Bxml>"
f"<Transfer><PhoneNumber>{HUMAN_AGENT_NUMBER}</PhoneNumber></Transfer>"
"</Bxml>"
)
async with aiohttp.ClientSession() as session:
token_auth = aiohttp.BasicAuth(
os.environ["BANDWIDTH_CLIENT_ID"],
os.environ["BANDWIDTH_CLIENT_SECRET"],
)
async with session.post(
OAUTH_TOKEN_URL,
auth=token_auth,
data={"grant_type": "client_credentials"},
) as token_response:
token_response.raise_for_status()
access_token = (await token_response.json())["access_token"]
endpoint = f"{VOICE_API_BASE_URL}/accounts/{account_id}/calls/{call_id}/bxml"
headers = {
"Authorization": f"Bearer {access_token}",
"Content-Type": "application/xml",
}
async with session.put(endpoint, headers=headers, data=bxml) as response:
response.raise_for_status()
# Acknowledge the tool call so the LLM knows the transfer is in flight.
# The pipeline will end naturally when Bandwidth closes the media stream.
await params.result_callback({"status": "transferring"})
# In the WebSocket handler, after constructing `llm`:
llm.register_function(
"transfer_to_human",
functools.partial(transfer_to_human, call_id=call_id, account_id=account_id),
)
Notable details:
- The endpoint is
PUT https://voice.bandwidth.com/api/v2/accounts/{accountId}/calls/{callId}/bxmlwith anapplication/xmlbody — Bandwidth's Update Call BXML operation. The body root element is<Bxml>(not<Response>) when used with this endpoint. call_idandaccount_idare bound at registration time viafunctools.partialso they come from the same trusted token mapping the rest of the bot uses. The LLM never gets to influence which call gets transferred.- OAuth uses the same
BANDWIDTH_CLIENT_ID/BANDWIDTH_CLIENT_SECRETalready configured for auto hang-up — no new env vars beyondHUMAN_AGENT_NUMBER. - The response is
204 No Contenton success. There's no need to drain the body, butraise_for_status()will surface auth or call-state errors loudly. params.result_callbackreturns a short status string so the LLM doesn't hallucinate a confirmation message. In practice the caller hears Bandwidth's transfer behavior take over almost immediately.<Transfer>accepts the full set of attributes documented in the BXML reference — settransferCallerId, attach atransferCompleteUrl, or list multiple<PhoneNumber>entries to ring a hunt group.
What happens to the pipeline?
When Bandwidth processes the new BXML, the existing <StartStream> is interrupted: the bidirectional WebSocket closes, the FastAPIWebsocketTransport fires on_client_disconnected, and the pipeline cancels and emits an EndFrame. The BandwidthFrameSerializer's auto-hang-up logic still runs, but the call is already redirecting — the Voice API returns 404 for the hang-up POST and the serializer logs it at debug and moves on. No special teardown is required; the same shutdown path that runs at the end of any normal call handles this cleanly.
Production Considerations
- Verify the webhook signature. Bandwidth signs voice webhooks; verify the signature in addition to (or instead of) Basic Auth for defense in depth.
- IP-allowlist Bandwidth's egress ranges at your ingress.
- Back the token store with Redis or similar if you run more than one worker — the in-memory store in the example is single-process only.
- DTMF. Bandwidth doesn't deliver DTMF over the media stream. Capture it with the BXML
<Gather>verb on a separate webhook handler. - Higher-fidelity audio. μ-law at 8 kHz is the default. For noticeably better TTS quality, switch the serializer to PCM at 24 kHz.
Next Steps
- Swap pipeline components. Try a different LLM, STT, or TTS provider — the telephony layer doesn't change.
- Add more tools. Beyond
transfer_to_human, give the LLM tools to look up a customer, schedule a callback, end the call, or anything else your business logic needs. - Explore the library. Read the
pipecat-bandwidthsource to see the protocol details, or consult the Pipecat Community Integrations page.