Responses API

Fireworks.ai offers a powerful Responses API that allows for more complex and stateful interactions with models. This guide will walk you through the key features and how to use them.

The Responses API has a different data retention policy than the chat completions endpoint. See Data Privacy & security.

Overview

The Responses API is designed for building conversational applications and complex workflows. It allows you to:

Continue conversations: Maintain context across multiple turns without resending the entire history.
Use external tools: Integrate with external services and data sources through the Model Context Protocol (MCP).
Stream responses: Receive results as they are generated, enabling real-time applications.
Control tool usage: Set limits on tool calls with max_tool_calls parameter.
Manage data retention: Choose whether to store conversations (default) or opt-out with store=false.

Basic Usage

You can interact with the Response API using the Fireworks Python SDK or by making direct HTTP requests.

Creating a Response

To start a new conversation, you use the client.responses.create method. For a complete example, see the getting started notebook.

from fireworks import LLM

llm = LLM(model="qwen3-235b-a22b", deployment_type="serverless")

response = llm.responses.create(
    input="What is reward-kit and what are its 2 main features? Keep it short Please analyze the fw-ai-external/reward-kit repository.",
    tools=[{"type": "sse", "server_url": "https://gitmcp.io/docs"}]
)

print(response.output[-1].content[0].text.split("</think>")[-1])

Continuing a Conversation with `previous_response_id`

To continue a conversation, you can use the previous_response_id parameter. This tells the API to use the context from a previous response, so you don’t have to send the entire conversation history again. For a complete example, see the previous response ID notebook.

from fireworks import LLM

llm = LLM(model="qwen3-235b-a22b", deployment_type="serverless")

# First, create an initial response
initial_response = llm.responses.create(
    input="What are the key features of reward-kit?",
    tools=[{"type": "sse", "server_url": "https://gitmcp.io/docs"}]
)
initial_response_id = initial_response.id

# Now, continue the conversation
continuation_response = llm.responses.create(
    input="How do I install it?",
    previous_response_id=initial_response_id,
    tools=[{"type": "sse", "server_url": "https://gitmcp.io/docs"}]
)

print(continuation_response.output[-1].content[0].text.split("</think>")[-1])

Streaming Responses

For real-time applications, you can stream the response as it’s being generated. For a complete example, see the streaming example notebook.

from fireworks import LLM

llm = LLM(model="qwen3-235b-a22b", deployment_type="serverless")

stream = llm.responses.create(
    input="give me 5 interesting facts on modelcontextprotocol/python-sdk -- keep it short!",
    stream=True,
    tools=[{"type": "mcp", "server_url": "https://mcp.deepwiki.com/mcp"}]
)

for chunk in stream:
    print(chunk)

Cookbook Examples

For more in-depth examples, check out the following notebooks:

Storing Responses

By default, responses are stored and can be referenced by their ID. You can disable this by setting store=False. If you do this, you will not be able to use the previous_response_id to continue the conversation. For a complete example, see the store=False notebook.

from fireworks import LLM

llm = LLM(model="qwen3-235b-a22b", deployment_type="serverless")

response = llm.responses.create(
    input="give me 5 interesting facts on modelcontextprotocol/python-sdk -- keep it short!",
    store=False,
    tools=[{"type": "mcp", "server_url": "https://mcp.deepwiki.com/mcp"}]
)

# This will fail because the previous response was not stored
try:
    continuation_response = llm.responses.create(
        input="Explain the second fact in more detail.",
        previous_response_id=response.id
    )
except Exception as e:
    print(e)

Deleting Stored Responses

When responses are stored (the default behavior with store=True), you can immediately delete them from storage using the DELETE endpoint. This permanently removes the conversation data.

from fireworks import LLM
import requests
import os

llm = LLM(model="qwen3-235b-a22b", deployment_type="serverless")

# Create a response
response = llm.responses.create(
    input="What is the capital of France?",
    store=True  # This is the default
)

response_id = response.id
print(f"Created response with ID: {response_id}")

# Delete the response immediately
headers = {
    "Authorization": f"Bearer {os.getenv('FIREWORKS_API_KEY')}",
    "x-fireworks-account-id": "your-account-id"
}
delete_response = requests.delete(
    f"https://api.fireworks.ai/inference/v1/responses/{response_id}",
    headers=headers
)

if delete_response.status_code == 200:
    print("Response deleted successfully")
else:
    print(f"Failed to delete response: {delete_response.status_code}")

Once a response is deleted, it cannot be recovered.

Response Structure

All response objects include the following fields:

id: Unique identifier for the response (e.g., resp_abc123...)
created_at: Unix timestamp when the response was created
status: Status of the response (typically "completed")
model: The model used to generate the response
output: Array of message objects, tool calls, and tool outputs
usage: Token usage information:
- prompt_tokens: Number of tokens in the prompt
- completion_tokens: Number of tokens in the completion
- total_tokens: Total tokens used
previous_response_id: ID of the previous response in the conversation (if any)
store: Whether the response was stored (boolean)
max_tool_calls: Maximum number of tool calls allowed (if set)

Example Response

{
  "id": "resp_abc123...",
  "created_at": 1735000000,
  "status": "completed",
  "model": "accounts/fireworks/models/qwen3-235b-a22b",
  "output": [
    {
      "id": "msg_xyz789...",
      "role": "user",
      "content": [{"type": "input_text", "text": "What is 2+2?"}],
      "status": "completed"
    },
    {
      "id": "msg_def456...",
      "role": "assistant",
      "content": [{"type": "output_text", "text": "2 + 2 equals 4."}],
      "status": "completed"
    }
  ],
  "usage": {
    "prompt_tokens": 15,
    "completion_tokens": 8,
    "total_tokens": 23
  },
  "previous_response_id": null,
  "store": true,
  "max_tool_calls": null
}

Get Started

Querying models

Dedicated Deployments

Fine-tuning

Integrations

Policies

Administration

Overview

Basic Usage

Creating a Response

Continuing a Conversation with `previous_response_id`

Streaming Responses

Cookbook Examples

Storing Responses

Deleting Stored Responses

Response Structure

Example Response

Get Started

Querying models

Dedicated Deployments

Fine-tuning

Integrations

Policies

Administration

​Overview

​Basic Usage

​Creating a Response

​Continuing a Conversation with previous_response_id

​Streaming Responses

​Cookbook Examples

​Storing Responses

​Deleting Stored Responses

​Response Structure

​Example Response

Overview

Basic Usage

Creating a Response

Continuing a Conversation with `previous_response_id`

Streaming Responses

Cookbook Examples

Storing Responses

Deleting Stored Responses

Response Structure

Example Response