Pipecat Integration

This guide will walk you through building a real-time voice AI agent on Bandwidth's Voice Network using Pipecat, the open-source Python framework for voice AI pipelines. Pipecat orchestrates the STT → LLM → TTS pipeline; Bandwidth provides the carrier-grade telephony layer that brings real PSTN callers into it.

The integration is published as the pipecat-bandwidth library, which provides a BandwidthFrameSerializer that plugs directly into Pipecat's FastAPIWebsocketTransport. You can swap STT, LLM, and TTS providers (Deepgram, OpenAI, Cartesia, ElevenLabs, etc.) without touching the telephony layer.

info

This integration uses Bandwidth Programmable Voice with Media Streaming. Your application receives μ-law audio frames directly over a bidirectional WebSocket and feeds them into a Pipecat pipeline — no SIP-to-SIP routing required.

What you'll need

A Bandwidth Programmable Voice account with:
- A purchased phone number assigned to a Voice Application
- API credentials (OAuth 2.0 client ID + secret)
- Don't have an account yet? Try Bandwidth Build for free — get a real phone number and 3000 credits to start building immediately.
- If you have a full Bandwidth App account but haven't set it up yet, check out our Account Setup guide.
Python 3.11+
uv (or pip if you prefer)
A publicly accessible URL for your application (e.g., using ngrok)
API keys for the AI services in your pipeline. The example uses:
- OpenAI for the LLM
- Deepgram for STT
- Cartesia for TTS
The reference example: Bandwidth/pipecat-bandwidth/examples/bandwidth-chatbot

Call Flow

Before we dive in, let's walk through what an inbound call flow looks like with this integration.

This flow demonstrates a basic AI agent answering an inbound call. Let's break it down:

A user calls your Bandwidth number.
Bandwidth POSTs a Basic-Auth-protected webhook to your application with the inbound call event, including callId and accountId.
Your application mints a one-time correlation token bound to those server-trusted IDs and responds with a <StartStream> BXML pointing at wss://<your-host>/ws/{token}.
Bandwidth opens a WebSocket to your application. Your app validates the token, recovers the trusted callId/accountId, and constructs a BandwidthFrameSerializer.
The serializer decodes Bandwidth's μ-law audio into Pipecat audio frames. The pipeline transcribes, generates a response, synthesizes speech, and streams it back over the same WebSocket.
The caller hears the AI agent. Interruptions emit a clear event so the bot stops talking immediately when the caller speaks.
When the pipeline ends, the serializer auto-hangs-up via the Bandwidth Voice API using the trusted callId.

Why the token-in-URL?

The BandwidthFrameSerializer's auto-hang-up path uses your operator OAuth credentials to terminate the call by callId. If that ID came from an unauthenticated WebSocket frame, anyone who can reach your /ws endpoint could feed an arbitrary callId and trigger a hang-up against a live call in your account.

The fix is to trust only the authenticated webhook body for callId and accountId and bind them to the WebSocket connection via a server-issued correlation token — exactly what the example below does. Never derive these IDs from the WebSocket start event metadata.

Let's Build It!

For convenience, we provide a complete sample application: Bandwidth/pipecat-bandwidth/examples/bandwidth-chatbot.

It's a single-file FastAPI server that:

Returns a <StartStream> BXML response on Bandwidth's voice webhook (Basic Auth).
Accepts the bidirectional WebSocket and validates a one-time correlation token.
Runs a Deepgram (STT) → OpenAI (LLM) → Cartesia (TTS) Pipecat pipeline with Silero VAD for turn detection.

The following sections walk through the example to help you understand how it works.

Setup our Environment

Clone the repository and move into the example directory:

git clone https://github.com/Bandwidth/pipecat-bandwidth
cd pipecat-bandwidth/examples/bandwidth-chatbot
uv sync
cp env.example .env

Fill in your .env:

# Public hostname for the BXML StartStream destination (no scheme)
PROXY_HOST=

# Optional: change the local port the bot listens on
PORT=7860

# Bandwidth API credentials (OAuth 2.0 client_credentials)
# Required for auto hang-up on EndFrame / CancelFrame.
BANDWIDTH_CLIENT_ID=
BANDWIDTH_CLIENT_SECRET=

# Webhook Basic Auth. Set the same username/password in your Bandwidth
# voice application's webhook configuration.
BANDWIDTH_WEBHOOK_USERNAME=
BANDWIDTH_WEBHOOK_PASSWORD=

# AI service keys
OPENAI_API_KEY=
DEEPGRAM_API_KEY=
CARTESIA_API_KEY=

Run the bot:

uv run python bot.py

A successful startup will log something like:

INFO:     Uvicorn running on http://0.0.0.0:7860 (Press CTRL+C to quit)
INFO:     Application startup complete.

Handle the Inbound Voice Webhook

When a user calls your Bandwidth number, Bandwidth POSTs to your inbound voice webhook. The handler authenticates the request, extracts the trusted callId / accountId from the body, mints a one-time correlation token, and returns a <StartStream> BXML.

@app.post("/")
async def inbound_call(request: Request) -> HTMLResponse:
    """Bandwidth voice webhook. Basic Auth required; trust the body's IDs."""
    _verify_webhook_auth(request)

    body = await request.json()
    call_id = body.get("callId")
    account_id = body.get("accountId")
    if not call_id or not account_id:
        raise HTTPException(
            status_code=status.HTTP_400_BAD_REQUEST,
            detail="Webhook body missing callId or accountId",
        )

    token = await _issue_token(str(call_id), str(account_id))

    bxml = f"""<?xml version="1.0" encoding="UTF-8"?>
<Response>
  <StartStream destination="wss://{PROXY_HOST}/ws/{token}" mode="bidirectional" tracks="inbound"/>
  <Pause duration="86400"/>
</Response>"""
    return HTMLResponse(content=bxml, media_type="application/xml")

A few things to note:

_verify_webhook_auth enforces HTTP Basic Auth using BANDWIDTH_WEBHOOK_USERNAME / BANDWIDTH_WEBHOOK_PASSWORD. The example refuses to start (returns 503) if these are unset — silent acceptance would defeat the point of the trust chain.
callId and accountId are read only from the authenticated webhook body.
_issue_token mints a 32-byte URL-safe token with a short TTL (60 seconds) and stores (call_id, account_id, expires_at) server-side keyed by the token.
The BXML uses <StartStream mode="bidirectional" tracks="inbound"/> to open a two-way audio WebSocket. The <Pause duration="86400"/> keeps the call leg alive while the WebSocket session runs.

Accept the WebSocket and Wire Up Pipecat

Bandwidth opens a WebSocket to /ws/{token}. The handler validates the token, recovers the trusted IDs, constructs a BandwidthFrameSerializer, and runs the Pipecat pipeline.

@app.websocket("/ws/{token}")
async def websocket_endpoint(websocket: WebSocket, token: str) -> None:
    """Validate the correlation token, then run the bot pipeline."""
    trusted = await _consume_token(token)
    if trusted is None:
        await websocket.close(code=1008)  # 1008 = policy violation
        return

    call_id, account_id = trusted
    await websocket.accept()

    # The first frame from Bandwidth is the "start" event. We use streamId
    # from it (it's just a wire-protocol identifier). callId and accountId
    # come from the trusted token mapping — we deliberately ignore whatever
    # the WS metadata claims.
    first = await websocket.receive_text()
    start_event = json.loads(first)
    stream_id = start_event.get("metadata", {}).get("streamId", "")

    serializer = BandwidthFrameSerializer(
        stream_id=stream_id,
        call_id=call_id,
        account_id=account_id,
        client_id=os.getenv("BANDWIDTH_CLIENT_ID"),
        client_secret=os.getenv("BANDWIDTH_CLIENT_SECRET"),
    )

    transport = FastAPIWebsocketTransport(
        websocket=websocket,
        params=FastAPIWebsocketParams(
            audio_in_enabled=True,
            audio_out_enabled=True,
            add_wav_header=False,
            serializer=serializer,
        ),
    )

    await run_bot(transport)

Key points:

_consume_token pops the token from server-side state — single-use, expires in 60s. Invalid tokens get rejected with WS close code 1008.
The Bandwidth start event's streamId is just a wire-protocol identifier; using it is safe. callId / accountId are not read from the WS frame.
BandwidthFrameSerializer handles the protocol details: μ-law decoding inbound, audio encoding outbound (μ-law by default, or PCM at 8/16/24 kHz), interruption signaling via clear, and auto hang-up via the Voice API on EndFrame / CancelFrame.
The serializer plugs into a standard Pipecat FastAPIWebsocketTransport — there's no Bandwidth-specific transport class to learn.

Build the Pipecat Pipeline

With the transport ready, the bot pipeline is plain Pipecat. Swap any of the services without touching the telephony layer.

async def run_bot(transport: FastAPIWebsocketTransport) -> None:
    stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))

    llm = OpenAILLMService(
        api_key=os.getenv("OPENAI_API_KEY"),
        settings=OpenAILLMService.Settings(
            system_instruction=(
                "You are a helpful assistant on a phone call. Keep responses "
                "concise and conversational. Avoid special characters since "
                "your output is converted to audio."
            ),
        ),
    )

    tts = CartesiaTTSService(
        api_key=os.getenv("CARTESIA_API_KEY"),
        settings=CartesiaTTSService.Settings(
            voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
        ),
    )

    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
    )

    pipeline = Pipeline([
        transport.input(),
        stt,
        user_aggregator,
        llm,
        tts,
        transport.output(),
        assistant_aggregator,
    ])

    task = PipelineTask(
        pipeline,
        params=PipelineParams(
            audio_in_sample_rate=8000,
            audio_out_sample_rate=8000,
            enable_metrics=True,
            enable_usage_metrics=True,
        ),
    )

    @transport.event_handler("on_client_connected")
    async def on_client_connected(transport, client):
        context.add_message({"role": "user", "content": "Please introduce yourself to the caller."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
    async def on_client_disconnected(transport, client):
        await task.cancel()

    runner = PipelineRunner(handle_sigint=False)
    await runner.run(task)

Notable details:

Silero VAD is wired into the user aggregator for turn detection.
8 kHz audio is the Bandwidth default for μ-law. For higher TTS quality, set outbound_encoding="PCM" and outbound_pcm_sample_rate=24000 on BandwidthFrameSerializer.InputParams, and bump audio_out_sample_rate accordingly.
on_client_connected is a clean hook to seed the conversation with a greeting prompt as soon as the WebSocket is fully connected.

Connect to a Public URL

Bandwidth needs a publicly reachable HTTPS host to deliver the inbound webhook and open the media-stream WebSocket. In a separate terminal:

ngrok http 7860

Copy the ngrok-free.app hostname (no scheme) into .env as PROXY_HOST and restart the bot so the BXML response references the correct destination.

Configure your Bandwidth Voice Application

Finally, point your Bandwidth Voice Application at your public webhook URL.

Log in to the Bandwidth App and open your Voice Application (or create a new one).
Under Call initiated, select POST as the callback method and set the Callback URL to https://<your-public-host>/.
Tick Use a callback username and password and enter the same BANDWIDTH_WEBHOOK_USERNAME / BANDWIDTH_WEBHOOK_PASSWORD values you put in .env. Bandwidth will send these in a Basic Auth header on every inbound call event.
- The example bot enforces Basic Auth on the webhook so it can trust the body's callId / accountId (the security note under Call Flow explains why). Bandwidth itself doesn't require this — if you'd rather verify Bandwidth's webhook signature or IP-allowlist their egress ranges instead, swap out _verify_webhook_auth and skip this checkbox.
Click Save.
Make sure the application is linked to a Voice Configuration Package and that the package is assigned to your Bandwidth phone number.

Test the Integration

Call your Bandwidth phone number. You should hear the bot introduce itself and you can start a conversation. When you hang up — or the bot ends the call — the serializer terminates the call leg via the Bandwidth Voice API using the trusted callId.

Hand off to a Human Agent

Sometimes the AI agent isn't enough — the caller wants a person, the bot decides the conversation is out of scope, or you want a "press 0 for a human" escape hatch. Because the FastAPI app sits between Bandwidth and the AI services, handoff is straightforward: give the LLM a transfer_to_human tool, and when it fires, replace the active call's BXML with a <Transfer> verb via Bandwidth's Update Call BXML endpoint. Bandwidth tears down the media stream, dials the human, and bridges the legs — your Pipecat pipeline exits cleanly on its own.

Define the tool

Pipecat's OpenAILLMService accepts an OpenAI-style tools list in its Settings. Register a single function tool that the model can call when it decides the conversation should escalate:

TRANSFER_TOOL = {
    "type": "function",
    "function": {
        "name": "transfer_to_human",
        "description": (
            "Transfer the caller to a live human agent. Call this when the "
            "caller asks to speak to a person, says they want a human, or "
            "when you cannot help them and an agent should take over."
        ),
        "parameters": {
            "type": "object",
            "properties": {
                "reason": {
                    "type": "string",
                    "description": "Short reason for the transfer, for logging.",
                },
            },
            "required": ["reason"],
        },
    },
}

llm = OpenAILLMService(
    api_key=os.getenv("OPENAI_API_KEY"),
    settings=OpenAILLMService.Settings(
        system_instruction=(
            "You are a helpful assistant on a phone call. Keep responses "
            "concise and conversational. Avoid special characters since "
            "your output is converted to audio. If the caller asks for a "
            "human, call the transfer_to_human tool — don't promise a "
            "transfer in text, actually call the tool."
        ),
        tools=[TRANSFER_TOOL],
    ),
)

Handle the tool call

When the LLM calls transfer_to_human, your handler does the actual work — PUTing fresh BXML to Bandwidth that redirects the call leg into a <Transfer>. The OAuth credentials and trusted callId / accountId are already in scope from the WebSocket handler — pass them in when you register the handler against the LLMService:

HUMAN_AGENT_NUMBER = os.getenv("HUMAN_AGENT_NUMBER", "+19195554321")
OAUTH_TOKEN_URL = "https://api.bandwidth.com/api/v1/oauth2/token"
VOICE_API_BASE_URL = "https://voice.bandwidth.com/api/v2"


async def transfer_to_human(
    params: FunctionCallParams,
    *,
    call_id: str,
    account_id: str,
) -> None:
    """Replace the active call's BXML with a <Transfer> to a human agent."""
    bxml = (
        '<?xml version="1.0" encoding="UTF-8"?>'
        "<Bxml>"
        f"<Transfer><PhoneNumber>{HUMAN_AGENT_NUMBER}</PhoneNumber></Transfer>"
        "</Bxml>"
    )

    async with aiohttp.ClientSession() as session:
        token_auth = aiohttp.BasicAuth(
            os.environ["BANDWIDTH_CLIENT_ID"],
            os.environ["BANDWIDTH_CLIENT_SECRET"],
        )
        async with session.post(
            OAUTH_TOKEN_URL,
            auth=token_auth,
            data={"grant_type": "client_credentials"},
        ) as token_response:
            token_response.raise_for_status()
            access_token = (await token_response.json())["access_token"]

        endpoint = f"{VOICE_API_BASE_URL}/accounts/{account_id}/calls/{call_id}/bxml"
        headers = {
            "Authorization": f"Bearer {access_token}",
            "Content-Type": "application/xml",
        }
        async with session.put(endpoint, headers=headers, data=bxml) as response:
            response.raise_for_status()

    # Acknowledge the tool call so the LLM knows the transfer is in flight.
    # The pipeline will end naturally when Bandwidth closes the media stream.
    await params.result_callback({"status": "transferring"})


# In the WebSocket handler, after constructing `llm`:
llm.register_function(
    "transfer_to_human",
    functools.partial(transfer_to_human, call_id=call_id, account_id=account_id),
)

Notable details:

The endpoint is PUT https://voice.bandwidth.com/api/v2/accounts/{accountId}/calls/{callId}/bxml with an application/xml body — Bandwidth's Update Call BXML operation. The body root element is <Bxml> (not <Response>) when used with this endpoint.
call_id and account_id are bound at registration time via functools.partial so they come from the same trusted token mapping the rest of the bot uses. The LLM never gets to influence which call gets transferred.
OAuth uses the same BANDWIDTH_CLIENT_ID / BANDWIDTH_CLIENT_SECRET already configured for auto hang-up — no new env vars beyond HUMAN_AGENT_NUMBER.
The response is 204 No Content on success. There's no need to drain the body, but raise_for_status() will surface auth or call-state errors loudly.
params.result_callback returns a short status string so the LLM doesn't hallucinate a confirmation message. In practice the caller hears Bandwidth's transfer behavior take over almost immediately.
<Transfer> accepts the full set of attributes documented in the BXML reference — set transferCallerId, attach a transferCompleteUrl, or list multiple <PhoneNumber> entries to ring a hunt group.

What happens to the pipeline?

When Bandwidth processes the new BXML, the existing <StartStream> is interrupted: the bidirectional WebSocket closes, the FastAPIWebsocketTransport fires on_client_disconnected, and the pipeline cancels and emits an EndFrame. The BandwidthFrameSerializer's auto-hang-up logic still runs, but the call is already redirecting — the Voice API returns 404 for the hang-up POST and the serializer logs it at debug and moves on. No special teardown is required; the same shutdown path that runs at the end of any normal call handles this cleanly.

Production Considerations

Verify the webhook signature. Bandwidth signs voice webhooks; verify the signature in addition to (or instead of) Basic Auth for defense in depth.
IP-allowlist Bandwidth's egress ranges at your ingress.
Back the token store with Redis or similar if you run more than one worker — the in-memory store in the example is single-process only.
DTMF. Bandwidth doesn't deliver DTMF over the media stream. Capture it with the BXML <Gather> verb on a separate webhook handler.
Higher-fidelity audio. μ-law at 8 kHz is the default. For noticeably better TTS quality, switch the serializer to PCM at 24 kHz.

Next Steps

Swap pipeline components. Try a different LLM, STT, or TTS provider — the telephony layer doesn't change.
Add more tools. Beyond transfer_to_human, give the LLM tools to look up a customer, schedule a callback, end the call, or anything else your business logic needs.
Explore the library. Read the pipecat-bandwidth source to see the protocol details, or consult the Pipecat Community Integrations page.

What you'll need​

Call Flow​

Let's Build It!​

Setup our Environment​

Handle the Inbound Voice Webhook​

Accept the WebSocket and Wire Up Pipecat​

Build the Pipecat Pipeline​

Connect to a Public URL​

Configure your Bandwidth Voice Application​

Test the Integration​

Hand off to a Human Agent​

Define the tool​

Handle the tool call​

What happens to the pipeline?​

Production Considerations​

Next Steps​