Skip to main content

OpenAI Realtime SIP Integration

This guide will walk you through integrating Bandwidth's Voice Network with OpenAI's Realtime SIP Interface. This integration allows you to leverage OpenAI's advanced AI capabilities in your call flows.

info

This integration is only available on the Universal Platform.

Please reach out to your Bandwidth CSM to confirm that your account and trunk configuration are correctly enabled to interconnect with OpenAI's Realtime API via SIP Connector.

What you'll need

Call Flow

Before we dive in, let's walk through what an inbound call flow looks like with this integration.

This flow demonstrates a call that is answered by an AI agent and then transferred to a human agent. Let's break it down:

  1. A user calls your Bandwidth number.
  2. Bandwidth routes the call to OpenAI's SIP endpoint.
  3. OpenAI sends a realtime.call.incoming event to your webhook URL.
  4. Your application makes a POST /calls/{callId}/accept request to accept the call and give the agent its context.
  5. OpenAI responds with a 200 OK response.
  6. Your application asynchronously establishes a WebSocket connection with OpenAI to start the stream, enabling you to send commands to the agent during the call.
  7. Your application responds with a 200 OK response to the initial event sent in step 3.
    1. This is very important - OpenAI will not connect the call until your application responds with a 200 OK response to the initial realtime.call.incoming event.
  8. The user and the AI agent can now converse.
  9. When the user is ready to speak to a human agent, your application makes a POST /calls/{callId}/refer request to transfer the call.
  10. OpenAI sends a SIP REFER to Bandwidth to transfer the call.
  11. OpenAI responds with a 200 OK response to your application after Bandwidth accepts the refer request.
  12. Bandwidth refers the call, and now the user and the human agent can now converse.

REFER to a tel_uri is not supported yet. You must REFER to a sip_uri that points to your Bandwidth trunk, which will then route the call to the desired Bandwidth number. The IP Address needed for the REFER can be found in the contact header of the initial POST request that OpenAI sends in step 3.

Let's Build It!

For convenience - we have provided a sample application to get you started. You can find it here: bandwidth-samples/openai-realtime-sip-python. The sample application is built using Python and FastAPI, but you can use any language or framework that you prefer, such as NodeJS + Express or Java + Spring.

To run the sample application, simply clone the repository and follow the instructions in the README.

The following sections will walk you through the sample application code to help you understand how it works.

Setup our Environment

Lets first clone our sample application:

git clone https://github.com/Bandwidth-Samples/openai-realtime-sip-python
cd openai-realtime-sip-python

The application provides a docker compose file to help you get started quickly, but you can also run the application via your local Python environment if you prefer.

First - ensure you have a .env file in the root of the project with the following variables:

export OPENAI_API_KEY="your_openai_api_key_here"
export OPENAI_SIGNING_SECRET="your_openai_signing_secret_here"
export REFER_TO="+19195554321"
export LOG_LEVEL="DEBUG"
export LOCAL_PORT=3000

Using Docker

docker compose up --build

Using Local Python Environment

python -m venv .venv
source .venv/bin/activate
cd app
pip install -r requirements.txt
python main.py

A successful startup should log the following:

INFO:     Will watch for changes in these directories: ['/app']
INFO: Uvicorn running on http://0.0.0.0:3000 (Press CTRL+C to quit)
INFO: Started reloader process [1] using WatchFiles
INFO: Started server process [8]
INFO: Waiting for application startup.
INFO: Application startup complete.

The application runs on port 3000 by default, but can be overridden by setting the LOCAL_PORT environment variable.

Creating our FastAPI Server

The sample application uses FastAPI to create a simple web server that can handle incoming HTTP requests from OpenAI.

The sample application also provides a models directory that contains Pydantic models for the various OpenAI webhook events. We wont define what those models look like here, but you can find them in the models directory of the sample application.

# main.py
# !/usr/bin/env python3

# ...imports...

# Set our Environment Variables
try:
OPENAI_API_KEY = os.environ["OPENAI_API_KEY"]
OPENAI_SIGNING_SECRET = os.environ["OPENAI_SIGNING_SECRET"]
REFER_TO = os.environ["REFER_TO"]
LOG_LEVEL = os.environ["LOG_LEVEL"]
LOCAL_PORT = int(os.environ.get("LOCAL_PORT", 3000))
except KeyError:
print("environment variables not set")
exit(1)

app = FastAPI()


# Health Check
@app.get("/health", status_code=http.HTTPStatus.NO_CONTENT)
def health():
return


# Handle Inbound Call Event from OpenAI
@app.post("/webhooks/openai/realtime/call/inbound", status_code=http.HTTPStatus.OK)
def handle_inbound_call(payload: RealtimeCallIncoming) -> Response:
return Response()


def start_server(port: int) -> None:
uvicorn.run(
"main:app",
host="0.0.0.0",
port=port,
log_level="debug",
reload=True,
)


if __name__ == "__main__":
start_server(LOCAL_PORT)

The above code creates a FastAPI application with two endpoints:

  1. A health check endpoint at /health that returns a 204 No Content response.
  2. A webhook endpoint at /webhooks/openai/realtime/call/inbound that handles incoming call events from OpenAI. Right now, it simply returns a 200 OK response.

The brunt of the logic will be added to the handle_inbound_call function.

Handle Inbound Call Event

When a user calls your Bandwidth number, Bandwidth will route the call to OpenAI's SIP endpoint. OpenAI will then send a realtime.call.incoming event to your webhook URL. Lets implement the logic to handle this event.

# main.py

# Now we need some of our constants
AUTH_HEADER = {"Authorization": f"Bearer {OPENAI_API_KEY}"}
OPENAI_REALTIME_CALLS_BASE_URL = "https://api.openai.com/v1/realtime/calls/"
GREETING = "Hello! How can I assist you today?"
AGENT_PROMPT = "You are a helpful customer support agent."
REFER_TOOL = Tool(
type="function",
name="refer",
description="Transfer the call to another person whenever the caller requests to be transferred or to speak to a person."
)
CALL_ACCEPTANCE_REQUEST = CallAcceptanceRequest(
type="realtime",
instructions=AGENT_PROMPT,
model="gpt-4o-realtime-preview",
tools=[REFER_TOOL],
)


@app.post("/webhooks/openai/realtime/call/inbound", status_code=http.HTTPStatus.OK)
def handle_inbound_call(payload: RealtimeCallIncoming) -> Response:
if payload.is_incoming_call():
# Grab the relevant info from the incoming event and stash it in a variable
call_id = payload.get_call_id()

acceptance_response = requests.post(
OPENAI_REALTIME_CALLS_BASE_URL + call_id + "/accept",
headers={**AUTH_HEADER, "Content-Type": "application/json"},
json=CALL_ACCEPTANCE_REQUEST.model_dump(),
)
if acceptance_response.status_code != http.HTTPStatus.OK:
return Response(status_code=http.HTTPStatus.INTERNAL_SERVER_ERROR)

return Response()

Let's look at some of our constants in more detail:

  • AUTH_HEADER: This is the authorization header that we will use to authenticate our requests to OpenAI. It contains our API key.
  • OPENAI_REALTIME_CALLS_BASE_URL: This is the base URL for the OpenAI Realtime Calls API.
  • GREETING: This is a simple greeting message that the agent will say when the call is answered. We will use this later when we implement the websocket connection.
  • AGENT_PROMPT: This is the prompt that we will use to instruct the agent on how to behave during the call. The sample app includes a more detailed example prompt for demonstration purposes.
  • REFER_TOOL: This is a tool that we will provide to the agent to allow it to transfer the call to another person.
  • CALL_ACCEPTANCE_REQUEST: This is the request body that we will send to OpenAI when we accept the call.

Now lets walk through what we just did:

  1. We defined some constants that we will use later in the function.
  2. We check if the incoming event is indeed an incoming call using the is_incoming_call method on the RealtimeCallIncoming model.
  3. We extract the call_id and sip_host from the incoming event using the get_call_id and get_sip_host methods on the RealtimeCallIncoming model.
  4. We make a POST /calls/{callId}/accept request to OpenAI to accept the call and provide the agent with its context.
  5. We check if the response from OpenAI is a 200 OK response. If not, we log the error and return a 500 Internal Server Error response.
  6. Finally, we return a 200 OK response to OpenAI to let them know that we have successfully handled the event.

Establish WebSocket Connection

# main.py

# We need a couple more constants now
OPENAI_REALTIME_WEBSOCKET_URL = "wss://api.openai.com/v1/realtime"
CREATE_RESPONSE_REQUEST = CreateResponseRequest(
response=OpenAIResponse(instructions=f"Say to the caller: '{GREETING}'")
)


# This function will handle our websocket connection asynchronously
# OpenAI uses websockets to stream information about the conversation
# We can interact with the call and instruct the agent in real-time using this connection
async def websocket_task(call_id: str, sip_host: str = None) -> None:
try:
async with websockets.connect(
f"{OPENAI_REALTIME_WEBSOCKET_URL}?call_id={call_id}",
additional_headers=AUTH_HEADER,
) as websocket:
await websocket.send(CREATE_RESPONSE_REQUEST.model_dump_json())
while True:
response = await websocket.recv()
realtime_message = RealtimeMessage(**json.loads(response))

# Handle different message types from the OpenAI WebSocket Connection
# There are many different types of messages that OpenAI can send us
# For now we are only concerned with the "response.output_item.done" message type
# This message type indicates that the agent has finished processing a response
# and is ready for us to take action (e.g., transfer the call)
match realtime_message.type:
case "response.output_item.done":
if realtime_message.item.type == "function_call":
if realtime_message.item.name == "refer":
requests.post(
f"{OPENAI_REALTIME_CALLS_BASE_URL}{call_id}/refer",
headers=AUTH_HEADER,
json={"target_uri": f"sip:{REFER_TO}@{sip_host}"}
)
case _:
pass


except Exception as e:
print(f"WebSocket error: {e}")


@app.post("/webhooks/openai/realtime/call/inbound", status_code=http.HTTPStatus.OK)
def handle_inbound_call(payload: RealtimeCallIncoming) -> Response:
if payload.is_incoming_call():
# Grab the relevant info from the incoming event and stash it in a variable
call_id = payload.get_call_id()

# we need this for our REFER request
sip_host = payload.get_sip_host()

acceptance_response = requests.post(
OPENAI_REALTIME_CALLS_BASE_URL + call_id + "/accept",
headers={**AUTH_HEADER, "Content-Type": "application/json"},
json=CALL_ACCEPTANCE_REQUEST.model_dump(),
)
if acceptance_response.status_code != http.HTTPStatus.OK:
inspect(acceptance_response.json(), title="OpenAI API Error")
return Response(status_code=http.HTTPStatus.INTERNAL_SERVER_ERROR)

# New step - Start our websocket connection in a new thread
threading.Thread(
target=asyncio.run,
args=(websocket_task(call_id, sip_host),),
daemon=True,
).start()

return Response()

In the above code, we added a new function called websocket_task that establishes a WebSocket connection with OpenAI. This function is called in a new thread after we accept the call. Let's break down what we did:

  1. We defined a new constant called OPENAI_REALTIME_WEBSOCKET_URL that contains the URL for the OpenAI Realtime WebSocket endpoint.
  2. We defined a new constant called CREATE_RESPONSE_REQUEST that contains the request body that we will send to OpenAI to create a response.
  3. We created a new asynchronous function called websocket_task that takes in the call_id and sip_host as parameters.
  4. Inside the websocket_task function, we establish a WebSocket connection with OpenAI using the websockets library.
  5. We send a CREATE_RESPONSE_REQUEST to OpenAI to instruct the agent to say our greeting message to the caller.
  6. We enter a loop where we listen for messages from OpenAI.
  7. We use a match statement to handle different types of messages from OpenAI.
    1. In this case, we are only concerned with the response.output_item.done message type, which indicates that the agent has finished processing a response and is ready for us to take action.
    2. If the message contains a function_call item with the name refer, we make a POST /calls/{callId}/refer request to OpenAI to transfer the call to the number specified in the REFER_TO environment variable.
  8. We handle any exceptions that may occur during the WebSocket connection and log them to the console.
  9. Finally, we start the websocket_task function in a new thread after we accept the call.
  10. We also extract the sip_host from the incoming event using the get_sip_host method on the RealtimeCallIncoming model. This is needed for our REFER request.
  11. We pass the sip_host to the websocket_task function so that it can be used in the REFER request.
  12. We return a 200 OK response to OpenAI to let them know that we have successfully handled the event.

Connect to your Public Webhook URL

Now that we have our application running locally, we need to expose it to the internet so that OpenAI can send webhook events to it. You can use a tool like ngrok to create a secure tunnel to your local server.

In a new terminal window, run the following command:

ngrok http 3000

This will give you a public URL that you can use to configure your OpenAI project.

Configure your OpenAI Project

Now that you have a public URL for your webhook, you need to configure your OpenAI project to use it.

  1. Log in to the OpenAI Dashboard.
  2. Navigate to the "Webhooks" section of your project settings.
  3. Add a new webhook with the following details:
    • Name: Realtime Inbound Call - SIP
    • URL: https://{your-ngrok-id}.ngrok-free.app/webhooks/openai/realtime/call/inbound
    • Events: Select realtime.call.incoming

Create Webhook Diagram

Configure your Bandwidth Voice Configuration Package

Finally, you need to configure your Bandwidth Voice Configuration Package to route inbound calls to OpenAI's SIP endpoint.

  1. Log in to the Bandwidth Dashboard.
  2. Under Service Management, select Voice Configuration and create a new Voice Configuration Package or edit an existing one.
  3. Add a new Route and select Route to SIP URI.
  4. In the SIP URI field, enter sip:{my_project_id}@sip.api.openai.com where {my_project_id} is your OpenAI Project ID. ex sip:proj_12345abc@sip.api.openai.com
  5. Save your changes.
  6. Assign the Voice Configuration Package to your Bandwidth phone number.

Configure Modal

Test the Integration

Now that everything is set up, you can test the integration by calling your Bandwidth phone number! You should hear the AI agent greet you and be able to have a conversation with it. When you're ready to speak to a human agent, simply ask to be transferred and the call will be transferred to the number specified in the REFER_TO environment variable.