Step 1: Create and export an API key
Before you begin, create an API key in the Fireworks dashboard. Click Create API key and store it in a safe location. Once you have your API key, export it as an environment variable in your terminal:- macOS / Linux
- Windows
Copy
Ask AI
export FIREWORKS_API_KEY="your_api_key_here"
Copy
Ask AI
setx FIREWORKS_API_KEY "your_api_key_here"
Step 2: Make your first Serverless API call
- Python (Fireworks SDK)
- Python (OpenAI SDK)
- Python (Anthropic SDK)
- JavaScript (OpenAI SDK)
- JavaScript (Anthropic SDK)
- curl
Install the Fireworks Python SDK:Then make your first Serverless API call:
The SDK is currently in alpha. Use the
--pre flag when installing to get the latest version.Copy
Ask AI
pip install --pre fireworks-ai
Copy
Ask AI
from fireworks import Fireworks
client = Fireworks()
response = client.chat.completions.create(
model="accounts/fireworks/models/deepseek-v3p1",
messages=[{
"role": "user",
"content": "Say hello in Spanish",
}],
)
print(response.choices[0].message.content)
Fireworks provides an OpenAI compatible endpoint. Install the OpenAI Python SDK:Then make your first Serverless API call:
Copy
Ask AI
pip install openai
Copy
Ask AI
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ.get("FIREWORKS_API_KEY"),
base_url="https://api.fireworks.ai/inference/v1"
)
response = client.chat.completions.create(
model="accounts/fireworks/models/deepseek-v3p1",
messages=[{
"role": "user",
"content": "Say hello in Spanish",
}],
)
print(response.choices[0].message.content)
Fireworks provides an Anthropic compatible endpoint. Install the Anthropic Python SDK:Then make your first Serverless API call:
Copy
Ask AI
pip install anthropic
Copy
Ask AI
import os
import anthropic
client = anthropic.Anthropic(
api_key=os.environ.get("FIREWORKS_API_KEY"),
base_url="https://api.fireworks.ai/inference"
)
response = client.messages.create(
model="accounts/fireworks/models/deepseek-v3p1",
max_tokens=1024,
messages=[{
"role": "user",
"content": "Say hello in Spanish",
}],
)
print(response.content[0].text)
Fireworks provides an OpenAI compatible endpoint. Install the OpenAI JavaScript / TypeScript SDK:Then make your first Serverless API call:
Copy
Ask AI
npm install openai
Copy
Ask AI
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.FIREWORKS_API_KEY,
baseURL: "https://api.fireworks.ai/inference/v1",
});
const response = await client.chat.completions.create({
model: "accounts/fireworks/models/deepseek-v3p1",
messages: [
{
role: "user",
content: "Say hello in Spanish",
},
],
});
console.log(response.choices[0].message.content);
Fireworks provides an Anthropic compatible endpoint. Install the Anthropic JavaScript / TypeScript SDK:Then make your first Serverless API call:
Copy
Ask AI
npm install @anthropic-ai/sdk
Copy
Ask AI
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic({
apiKey: process.env.FIREWORKS_API_KEY,
baseURL: "https://api.fireworks.ai/inference",
});
const response = await client.messages.create({
model: "accounts/fireworks/models/deepseek-v3p1",
max_tokens: 1024,
messages: [
{
role: "user",
content: "Say hello in Spanish",
},
],
});
console.log(response.content[0].text);
Copy
Ask AI
curl https://api.fireworks.ai/inference/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $FIREWORKS_API_KEY" \
-d '{
"model": "accounts/fireworks/models/deepseek-v3p1",
"messages": [
{
"role": "user",
"content": "Say hello in Spanish"
}
]
}'
"¡Hola!"
Common use cases
Streaming responses
Stream responses token-by-token for a better user experience:- Python (Fireworks SDK)
- Python (OpenAI SDK)
- Python (Anthropic SDK)
- JavaScript (OpenAI SDK)
- JavaScript (Anthropic SDK)
- curl
Copy
Ask AI
from fireworks import Fireworks
client = Fireworks()
stream = client.chat.completions.create(
model="accounts/fireworks/models/deepseek-v3p1",
messages=[{"role": "user", "content": "Tell me a short story"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
Copy
Ask AI
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ.get("FIREWORKS_API_KEY"),
base_url="https://api.fireworks.ai/inference/v1"
)
stream = client.chat.completions.create(
model="accounts/fireworks/models/deepseek-v3p1",
messages=[{"role": "user", "content": "Tell me a short story"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
Copy
Ask AI
import os
import anthropic
client = anthropic.Anthropic(
api_key=os.environ.get("FIREWORKS_API_KEY"),
base_url="https://api.fireworks.ai/inference"
)
with client.messages.stream(
model="accounts/fireworks/models/deepseek-v3p1",
max_tokens=1024,
messages=[{"role": "user", "content": "Tell me a short story"}],
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
Copy
Ask AI
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.FIREWORKS_API_KEY,
baseURL: "https://api.fireworks.ai/inference/v1",
});
const stream = await client.chat.completions.create({
model: "accounts/fireworks/models/deepseek-v3p1",
messages: [{ role: "user", content: "Tell me a short story" }],
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content || "");
}
Copy
Ask AI
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic({
apiKey: process.env.FIREWORKS_API_KEY,
baseURL: "https://api.fireworks.ai/inference",
});
const stream = client.messages.stream({
model: "accounts/fireworks/models/deepseek-v3p1",
max_tokens: 1024,
messages: [{ role: "user", content: "Tell me a short story" }],
});
stream.on("text", (text) => {
process.stdout.write(text);
});
await stream.finalMessage();
Copy
Ask AI
curl https://api.fireworks.ai/inference/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $FIREWORKS_API_KEY" \
-d '{
"model": "accounts/fireworks/models/deepseek-v3p1",
"messages": [
{
"role": "user",
"content": "Tell me a short story"
}
],
"stream": true
}'
Function calling
Connect your models to external tools and APIs:- Python (Fireworks SDK)
- Python (OpenAI SDK)
- Python (Anthropic SDK)
- JavaScript (OpenAI SDK)
- JavaScript (Anthropic SDK)
- curl
Copy
Ask AI
from fireworks import Fireworks
client = Fireworks()
response = client.chat.completions.create(
model="accounts/fireworks/models/kimi-k2-instruct-0905",
messages=[
{"role": "user", "content": "What's the weather in Paris?"}
],
tools=[
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name, e.g. San Francisco",
}
},
"required": ["location"],
},
},
},
],
)
print(response.choices[0].message.tool_calls)
Copy
Ask AI
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ.get("FIREWORKS_API_KEY"),
base_url="https://api.fireworks.ai/inference/v1",
)
response = client.chat.completions.create(
model="accounts/fireworks/models/kimi-k2-instruct-0905",
messages=[
{"role": "user", "content": "What's the weather in Paris?"}
],
tools=[
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name, e.g. San Francisco",
}
},
"required": ["location"],
},
},
},
],
)
print(response.choices[0].message.tool_calls)
Copy
Ask AI
import os
import anthropic
client = anthropic.Anthropic(
api_key=os.environ.get("FIREWORKS_API_KEY"),
base_url="https://api.fireworks.ai/inference"
)
response = client.messages.create(
model="accounts/fireworks/models/kimi-k2-instruct-0905",
max_tokens=1024,
messages=[
{"role": "user", "content": "What's the weather in Paris?"}
],
tools=[
{
"name": "get_weather",
"description": "Get the current weather for a location",
"input_schema": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name, e.g. San Francisco",
}
},
"required": ["location"],
},
},
],
)
for block in response.content:
if block.type == "tool_use":
print(f"Tool: {block.name}, Input: {block.input}")
Copy
Ask AI
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.FIREWORKS_API_KEY,
baseURL: "https://api.fireworks.ai/inference/v1",
});
const tools = [
{
type: "function",
function: {
name: "get_weather",
description: "Get the current weather for a location",
parameters: {
type: "object",
properties: {
location: {
type: "string",
description: "City name, e.g. San Francisco",
},
},
required: ["location"],
},
},
},
];
const response = await client.chat.completions.create({
model: "accounts/fireworks/models/kimi-k2-instruct-0905",
messages: [{ role: "user", content: "What's the weather in Paris?" }],
tools: tools,
});
console.log(response.choices[0].message.tool_calls);
Copy
Ask AI
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic({
apiKey: process.env.FIREWORKS_API_KEY,
baseURL: "https://api.fireworks.ai/inference",
});
const response = await client.messages.create({
model: "accounts/fireworks/models/kimi-k2-instruct-0905",
max_tokens: 1024,
messages: [{ role: "user", content: "What's the weather in Paris?" }],
tools: [
{
name: "get_weather",
description: "Get the current weather for a location",
input_schema: {
type: "object",
properties: {
location: {
type: "string",
description: "City name, e.g. San Francisco",
},
},
required: ["location"],
},
},
],
});
for (const block of response.content) {
if (block.type === "tool_use") {
console.log(`Tool: ${block.name}, Input:`, block.input);
}
}
Copy
Ask AI
curl https://api.fireworks.ai/inference/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $FIREWORKS_API_KEY" \
-d '{
"model": "accounts/fireworks/models/kimi-k2-instruct-0905",
"messages": [
{
"role": "user",
"content": "What'\''s the weather in Paris?"
}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name, e.g. San Francisco"
}
},
"required": ["location"]
}
}
}
]
}'
Structured outputs (JSON mode)
Get reliable JSON responses that match your schema:- Python (Fireworks SDK)
- Python (OpenAI SDK)
- Python (Anthropic SDK)
- JavaScript (OpenAI SDK)
- JavaScript (Anthropic SDK)
- curl
Copy
Ask AI
from fireworks import Fireworks
client = Fireworks()
response = client.chat.completions.create(
model="accounts/fireworks/models/deepseek-v3p1",
messages=[
{
"role": "user",
"content": "Extract the name and age from: John is 30 years old",
}
],
response_format={
"type": "json_schema",
"json_schema": {
"name": "person",
"schema": {
"type": "object",
"properties": {
"name": { "type": "string" },
"age": { "type": "number" }
},
"required": ["name", "age"],
},
},
},
)
print(response.choices[0].message.content)
Copy
Ask AI
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ.get("FIREWORKS_API_KEY"),
base_url="https://api.fireworks.ai/inference/v1",
)
response = client.chat.completions.create(
model="accounts/fireworks/models/deepseek-v3p1",
messages=[
{
"role": "user",
"content": "Extract the name and age from: John is 30 years old",
}
],
response_format={
"type": "json_schema",
"json_schema": {
"name": "person",
"schema": {
"type": "object",
"properties": {"name": {"type": "string"}, "age": {"type": "number"}},
"required": ["name", "age"],
},
},
},
)
print(response.choices[0].message.content)
Copy
Ask AI
import os
import anthropic
client = anthropic.Anthropic(
api_key=os.environ.get("FIREWORKS_API_KEY"),
base_url="https://api.fireworks.ai/inference"
)
response = client.messages.create(
model="accounts/fireworks/models/deepseek-v3p1",
max_tokens=1024,
output_config={
"format": {
"type": "json_schema",
"schema": {
"type": "object",
"properties": {
"name": { "type": "string" },
"age": { "type": "number" }
},
"required": ["name", "age"],
},
}
},
messages=[
{
"role": "user",
"content": "Extract the name and age from: John is 30 years old",
}
],
)
print(response.content[0].text)
Copy
Ask AI
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.FIREWORKS_API_KEY,
baseURL: "https://api.fireworks.ai/inference/v1",
});
const response = await client.chat.completions.create({
model: "accounts/fireworks/models/deepseek-v3p1",
messages: [
{
role: "user",
content: "Extract the name and age from: John is 30 years old",
},
],
response_format: {
type: "json_schema",
json_schema: {
name: "person",
schema: {
type: "object",
properties: {
name: { type: "string" },
age: { type: "number" },
},
required: ["name", "age"],
},
},
},
});
console.log(response.choices[0].message.content);
Copy
Ask AI
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic({
apiKey: process.env.FIREWORKS_API_KEY,
baseURL: "https://api.fireworks.ai/inference",
});
const response = await client.messages.create({
model: "accounts/fireworks/models/deepseek-v3p1",
max_tokens: 1024,
output_config: {
format: {
type: "json_schema",
schema: {
type: "object",
properties: {
name: { type: "string" },
age: { type: "number" },
},
required: ["name", "age"],
},
},
},
messages: [
{
role: "user",
content: "Extract the name and age from: John is 30 years old",
},
],
});
console.log(response.content[0].text);
Copy
Ask AI
curl https://api.fireworks.ai/inference/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $FIREWORKS_API_KEY" \
-d '{
"model": "accounts/fireworks/models/deepseek-v3p1",
"messages": [
{
"role": "user",
"content": "Extract the name and age from: John is 30 years old"
}
],
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "person",
"schema": {
"type": "object",
"properties": {
"name": {
"type": "string"
},
"age": {
"type": "number"
}
},
"required": ["name", "age"]
}
}
}
}'
Reasoning
Some models support reasoning, where the model shows its thought process before giving the final answer:- Python (Fireworks SDK)
- Python (OpenAI SDK)
- Python (Anthropic SDK)
- JavaScript (OpenAI SDK)
- JavaScript (Anthropic SDK)
- curl
Copy
Ask AI
from fireworks import Fireworks
client = Fireworks()
response = client.chat.completions.create(
model="accounts/fireworks/models/deepseek-v3p2",
messages=[
{"role": "user", "content": "What is 25 * 37? Show your work."}
],
reasoning_effort="medium",
)
msg = response.choices[0].message
if msg.reasoning_content:
print("Reasoning:", msg.reasoning_content)
print("Answer:", msg.content)
Copy
Ask AI
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ.get("FIREWORKS_API_KEY"),
base_url="https://api.fireworks.ai/inference/v1",
)
response = client.chat.completions.create(
model="accounts/fireworks/models/deepseek-v3p2",
messages=[
{"role": "user", "content": "What is 25 * 37? Show your work."}
],
extra_body={"reasoning_effort": "medium"},
)
msg = response.choices[0].message
# Reasoning content is returned in a separate field
reasoning = getattr(msg, "reasoning_content", None)
if reasoning is None and hasattr(msg, "model_extra"):
reasoning = msg.model_extra.get("reasoning_content")
if reasoning:
print("Reasoning:", reasoning)
print("Answer:", msg.content)
The Anthropic SDK uses the
thinking parameter to enable reasoning:Copy
Ask AI
import os
import anthropic
client = anthropic.Anthropic(
api_key=os.environ.get("FIREWORKS_API_KEY"),
base_url="https://api.fireworks.ai/inference"
)
response = client.messages.create(
model="accounts/fireworks/models/deepseek-v3p2",
max_tokens=16000,
thinking={"type": "enabled", "budget_tokens": 4096},
messages=[
{"role": "user", "content": "What is 25 * 37? Show your work."}
],
)
for block in response.content:
if block.type == "thinking":
print("Thinking:", block.thinking)
elif block.type == "text":
print("Answer:", block.text)
Copy
Ask AI
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.FIREWORKS_API_KEY,
baseURL: "https://api.fireworks.ai/inference/v1",
});
const response = await client.chat.completions.create({
model: "accounts/fireworks/models/deepseek-v3p2",
messages: [
{ role: "user", content: "What is 25 * 37? Show your work." },
],
reasoning_effort: "medium",
});
const msg = response.choices[0].message;
if (msg.reasoning_content) {
console.log("Reasoning:", msg.reasoning_content);
}
console.log("Answer:", msg.content);
The Anthropic SDK uses the
thinking parameter to enable reasoning:Copy
Ask AI
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic({
apiKey: process.env.FIREWORKS_API_KEY,
baseURL: "https://api.fireworks.ai/inference",
});
const response = await client.messages.create({
model: "accounts/fireworks/models/deepseek-v3p2",
max_tokens: 16000,
thinking: { type: "enabled", budget_tokens: 4096 },
messages: [
{ role: "user", content: "What is 25 * 37? Show your work." },
],
});
for (const block of response.content) {
if (block.type === "thinking") {
console.log("Thinking:", block.thinking);
} else if (block.type === "text") {
console.log("Answer:", block.text);
}
}
Copy
Ask AI
curl https://api.fireworks.ai/inference/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $FIREWORKS_API_KEY" \
-d '{
"model": "accounts/fireworks/models/deepseek-v3p2",
"messages": [
{
"role": "user",
"content": "What is 25 * 37? Show your work."
}
],
"reasoning_effort": "medium"
}'
Vision models
Analyze images with vision-language models:- Python (Fireworks SDK)
- Python (OpenAI SDK)
- Python (Anthropic SDK)
- JavaScript (OpenAI SDK)
- JavaScript (Anthropic SDK)
- curl
Copy
Ask AI
from fireworks import Fireworks
client = Fireworks()
response = client.chat.completions.create(
model="accounts/fireworks/models/qwen2p5-vl-32b-instruct",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{
"type": "image_url",
"image_url": {
"url": "https://storage.googleapis.com/fireworks-public/image_assets/fireworks-ai-wordmark-color-dark.png"
},
},
],
}
],
)
print(response.choices[0].message.content)
Copy
Ask AI
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ.get("FIREWORKS_API_KEY"),
base_url="https://api.fireworks.ai/inference/v1"
)
response = client.chat.completions.create(
model="accounts/fireworks/models/qwen2p5-vl-32b-instruct",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{
"type": "image_url",
"image_url": {
"url": "https://storage.googleapis.com/fireworks-public/image_assets/fireworks-ai-wordmark-color-dark.png"
},
},
],
}
],
)
print(response.choices[0].message.content)
The Anthropic SDK uses its native image format with
type: "image" and a source object:Copy
Ask AI
import os
import anthropic
client = anthropic.Anthropic(
api_key=os.environ.get("FIREWORKS_API_KEY"),
base_url="https://api.fireworks.ai/inference"
)
response = client.messages.create(
model="accounts/fireworks/models/qwen2p5-vl-32b-instruct",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{
"type": "image",
"source": {
"type": "url",
"url": "https://storage.googleapis.com/fireworks-public/image_assets/fireworks-ai-wordmark-color-dark.png",
},
},
],
}
],
)
for block in response.content:
if block.type == "text":
print(block.text)
Copy
Ask AI
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.FIREWORKS_API_KEY,
baseURL: "https://api.fireworks.ai/inference/v1",
});
const response = await client.chat.completions.create({
model: "accounts/fireworks/models/qwen2p5-vl-32b-instruct",
messages: [
{
role: "user",
content: [
{ type: "text", text: "What's in this image?" },
{
type: "image_url",
image_url: {
url: "https://storage.googleapis.com/fireworks-public/image_assets/fireworks-ai-wordmark-color-dark.png",
},
},
],
},
],
});
console.log(response.choices[0].message.content);
Copy
Ask AI
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic({
apiKey: process.env.FIREWORKS_API_KEY,
baseURL: "https://api.fireworks.ai/inference",
});
const response = await client.messages.create({
model: "accounts/fireworks/models/qwen2p5-vl-32b-instruct",
max_tokens: 1024,
messages: [
{
role: "user",
content: [
{ type: "text", text: "What's in this image?" },
{
type: "image",
source: {
type: "url",
url: "https://storage.googleapis.com/fireworks-public/image_assets/fireworks-ai-wordmark-color-dark.png",
},
},
],
},
],
});
for (const block of response.content) {
if (block.type === "text") {
console.log(block.text);
}
}
Copy
Ask AI
curl https://api.fireworks.ai/inference/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $FIREWORKS_API_KEY" \
-d '{
"model": "accounts/fireworks/models/qwen2p5-vl-32b-instruct",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What'\''s in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "https://storage.googleapis.com/fireworks-public/image_assets/fireworks-ai-wordmark-color-dark.png"
}
}
]
}
]
}'
Serverless model lifecycle
Serverless models are managed by the Fireworks team and may be updated or deprecated as new models are released. We provide at least 2 weeks advance notice before removing any model, with longer notice periods for popular models based on usage. For production workloads requiring long-term model stability, we recommend using on-demand deployments, which give you full control over model versions and updates.Make sure to add a payment method to access higher rate limits up to 6,000 RPM. Without a payment method, you’re limited to 10 RPM.
The 6,000 RPM figure is the maximum ceiling enforced by our spike arrest policy. Your actual limit scales dynamically with sustained usage, so short-lived spikes may be throttled below that cap. For predictable throughput needs, consider on-demand deployments or requesting a rate review.
Next steps
Ready to scale to production, explore other modalities, or customize your models?Deploy and autoscale on Dedicated GPUs
Deploy with high performance on dedicated GPUs with fast autoscaling and minimal cold starts
Fine-tune Models
Improve model quality with supervised and reinforcement learning
Speech to Text
Real-time or batch audio transcription
Embeddings & Reranking
Use embeddings & reranking in search & context retrieval
Batch Inference
Run async inference jobs at scale, faster and cheaper
Browse 100+ Models
Explore all available models across modalities
API Reference
Complete API documentation