Fireworks.ai offers a powerful Responses API that allows for more complex and stateful interactions with models. This guide will walk you through the key features and how to use them.
The Responses API has a different data retention policy than the chat completions endpoint. See Data Privacy & security.
To start a new conversation, you use the client.responses.create method. For a complete example, see the getting started notebook.
Copy
Ask AI
from fireworks import LLMllm = LLM(model="qwen3-235b-a22b", deployment_type="serverless")response = llm.responses.create( input="What is reward-kit and what are its 2 main features? Keep it short Please analyze the fw-ai-external/reward-kit repository.", tools=[{"type": "sse", "server_url": "https://gitmcp.io/docs"}])print(response.output[-1].content[0].text.split("</think>")[-1])
Continuing a Conversation with previous_response_id
To continue a conversation, you can use the previous_response_id parameter. This tells the API to use the context from a previous response, so you don’t have to send the entire conversation history again. For a complete example, see the previous response ID notebook.
Copy
Ask AI
from fireworks import LLMllm = LLM(model="qwen3-235b-a22b", deployment_type="serverless")# First, create an initial responseinitial_response = llm.responses.create( input="What are the key features of reward-kit?", tools=[{"type": "sse", "server_url": "https://gitmcp.io/docs"}])initial_response_id = initial_response.id# Now, continue the conversationcontinuation_response = llm.responses.create( input="How do I install it?", previous_response_id=initial_response_id, tools=[{"type": "sse", "server_url": "https://gitmcp.io/docs"}])print(continuation_response.output[-1].content[0].text.split("</think>")[-1])
By default, responses are stored and can be referenced by their ID. You can disable this by setting store=False. If you do this, you will not be able to use the previous_response_id to continue the conversation. For a complete example, see the store=False notebook.
Copy
Ask AI
from fireworks import LLMllm = LLM(model="qwen3-235b-a22b", deployment_type="serverless")response = llm.responses.create( input="give me 5 interesting facts on modelcontextprotocol/python-sdk -- keep it short!", store=False, tools=[{"type": "mcp", "server_url": "https://mcp.deepwiki.com/mcp"}])# This will fail because the previous response was not storedtry: continuation_response = llm.responses.create( input="Explain the second fact in more detail.", previous_response_id=response.id )except Exception as e: print(e)
When responses are stored (the default behavior with store=True), you can immediately delete them from storage using the DELETE endpoint. This permanently removes the conversation data.
Copy
Ask AI
from fireworks import LLMimport requestsimport osllm = LLM(model="qwen3-235b-a22b", deployment_type="serverless")# Create a responseresponse = llm.responses.create( input="What is the capital of France?", store=True # This is the default)response_id = response.idprint(f"Created response with ID: {response_id}")# Delete the response immediatelyheaders = { "Authorization": f"Bearer {os.getenv('FIREWORKS_API_KEY')}", "x-fireworks-account-id": "your-account-id"}delete_response = requests.delete( f"https://api.fireworks.ai/inference/v1/responses/{response_id}", headers=headers)if delete_response.status_code == 200: print("Response deleted successfully")else: print(f"Failed to delete response: {delete_response.status_code}")
Once a response is deleted, it cannot be recovered.