Serverless Quickstart - Fireworks AI Docs

Step 1: Create and export an API key
Step 2: Make your first Serverless API call
Common use cases
Streaming responses
Function calling
Structured outputs (JSON mode)
Reasoning
Vision models
Serverless model lifecycle
Next steps

Serverless is the fastest way to get started with using open models. This quickstart will help you make your first API call in minutes.

Step 1: Create and export an API key

Before you begin, create an API key in the Fireworks dashboard. Click Create API key and store it in a safe location. Once you have your API key, export it as an environment variable in your terminal:

macOS / Linux
Windows

export FIREWORKS_API_KEY="your_api_key_here"

setx FIREWORKS_API_KEY "your_api_key_here"

Step 2: Make your first Serverless API call

Install the Fireworks Python SDK:

The SDK is currently in alpha. Use the --pre flag when installing to get the latest version.

pip install --pre fireworks-ai

Then make your first Serverless API call:

from fireworks import Fireworks

client = Fireworks()

response = client.chat.completions.create(
  model="accounts/fireworks/models/deepseek-v3p1",
  messages=[{
    "role": "user",
    "content": "Say hello in Spanish",
  }],
)

print(response.choices[0].message.content)

Fireworks provides an OpenAI compatible endpoint. Install the OpenAI Python SDK:

pip install openai

Then make your first Serverless API call:

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ.get("FIREWORKS_API_KEY"),
    base_url="https://api.fireworks.ai/inference/v1"
)

response = client.chat.completions.create(
    model="accounts/fireworks/models/deepseek-v3p1",
    messages=[{
        "role": "user",
        "content": "Say hello in Spanish",
    }],
)

print(response.choices[0].message.content)

Fireworks provides an Anthropic compatible endpoint. Install the Anthropic Python SDK:

pip install anthropic

Then make your first Serverless API call:

import os
import anthropic

client = anthropic.Anthropic(
    api_key=os.environ.get("FIREWORKS_API_KEY"),
    base_url="https://api.fireworks.ai/inference"
)

response = client.messages.create(
    model="accounts/fireworks/models/deepseek-v3p1",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": "Say hello in Spanish",
    }],
)

print(response.content[0].text)

Fireworks provides an OpenAI compatible endpoint. Install the OpenAI JavaScript / TypeScript SDK:

npm install openai

Then make your first Serverless API call:

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.FIREWORKS_API_KEY,
  baseURL: "https://api.fireworks.ai/inference/v1",
});

const response = await client.chat.completions.create({
  model: "accounts/fireworks/models/deepseek-v3p1",
  messages: [
    {
      role: "user",
      content: "Say hello in Spanish",
    },
  ],
});

console.log(response.choices[0].message.content);

Fireworks provides an Anthropic compatible endpoint. Install the Anthropic JavaScript / TypeScript SDK:

npm install @anthropic-ai/sdk

Then make your first Serverless API call:

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({
  apiKey: process.env.FIREWORKS_API_KEY,
  baseURL: "https://api.fireworks.ai/inference",
});

const response = await client.messages.create({
  model: "accounts/fireworks/models/deepseek-v3p1",
  max_tokens: 1024,
  messages: [
    {
      role: "user",
      content: "Say hello in Spanish",
    },
  ],
});

console.log(response.content[0].text);

curl https://api.fireworks.ai/inference/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $FIREWORKS_API_KEY" \
  -d '{
    "model": "accounts/fireworks/models/deepseek-v3p1",
    "messages": [
      {
        "role": "user",
        "content": "Say hello in Spanish"
      }
    ]
  }'

You should see a response like: "¡Hola!"

Common use cases

Streaming responses

Stream responses token-by-token for a better user experience:

from fireworks import Fireworks

client = Fireworks()

stream = client.chat.completions.create(
  model="accounts/fireworks/models/deepseek-v3p1",
  messages=[{"role": "user", "content": "Tell me a short story"}],
  stream=True
)

for chunk in stream:
  if chunk.choices[0].delta.content:
    print(chunk.choices[0].delta.content, end="")

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ.get("FIREWORKS_API_KEY"),
    base_url="https://api.fireworks.ai/inference/v1"
)

stream = client.chat.completions.create(
    model="accounts/fireworks/models/deepseek-v3p1",
    messages=[{"role": "user", "content": "Tell me a short story"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

import os
import anthropic

client = anthropic.Anthropic(
    api_key=os.environ.get("FIREWORKS_API_KEY"),
    base_url="https://api.fireworks.ai/inference"
)

with client.messages.stream(
    model="accounts/fireworks/models/deepseek-v3p1",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Tell me a short story"}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.FIREWORKS_API_KEY,
  baseURL: "https://api.fireworks.ai/inference/v1",
});

const stream = await client.chat.completions.create({
  model: "accounts/fireworks/models/deepseek-v3p1",
  messages: [{ role: "user", content: "Tell me a short story" }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || "");
}

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({
  apiKey: process.env.FIREWORKS_API_KEY,
  baseURL: "https://api.fireworks.ai/inference",
});

const stream = client.messages.stream({
  model: "accounts/fireworks/models/deepseek-v3p1",
  max_tokens: 1024,
  messages: [{ role: "user", content: "Tell me a short story" }],
});

stream.on("text", (text) => {
  process.stdout.write(text);
});

await stream.finalMessage();

curl https://api.fireworks.ai/inference/v1/chat/completions \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $FIREWORKS_API_KEY" \
    -d '{
    "model": "accounts/fireworks/models/deepseek-v3p1",
    "messages": [
        {
        "role": "user",
        "content": "Tell me a short story"
        }
    ],
    "stream": true
    }'

Function calling

Connect your models to external tools and APIs:

from fireworks import Fireworks

client = Fireworks()

response = client.chat.completions.create(
    model="accounts/fireworks/models/kimi-k2-instruct-0905",
    messages=[
        {"role": "user", "content": "What's the weather in Paris?"}
    ],
    tools=[
        {
            "type": "function",
            "function": {
                "name": "get_weather",
                "description": "Get the current weather for a location",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "City name, e.g. San Francisco",
                        }
                    },
                    "required": ["location"],
                },
            },
        },
    ],
)

print(response.choices[0].message.tool_calls)

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ.get("FIREWORKS_API_KEY"),
    base_url="https://api.fireworks.ai/inference/v1",
)

response = client.chat.completions.create(
    model="accounts/fireworks/models/kimi-k2-instruct-0905",
    messages=[
        {"role": "user", "content": "What's the weather in Paris?"}
    ],
    tools=[
        {
            "type": "function",
            "function": {
                "name": "get_weather",
                "description": "Get the current weather for a location",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "City name, e.g. San Francisco",
                        }
                    },
                    "required": ["location"],
                },
            },
        },
    ],
)

print(response.choices[0].message.tool_calls)

import os
import anthropic

client = anthropic.Anthropic(
    api_key=os.environ.get("FIREWORKS_API_KEY"),
    base_url="https://api.fireworks.ai/inference"
)

response = client.messages.create(
    model="accounts/fireworks/models/kimi-k2-instruct-0905",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "What's the weather in Paris?"}
    ],
    tools=[
        {
            "name": "get_weather",
            "description": "Get the current weather for a location",
            "input_schema": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City name, e.g. San Francisco",
                    }
                },
                "required": ["location"],
            },
        },
    ],
)

for block in response.content:
    if block.type == "tool_use":
        print(f"Tool: {block.name}, Input: {block.input}")

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.FIREWORKS_API_KEY,
  baseURL: "https://api.fireworks.ai/inference/v1",
});

const tools = [
  {
    type: "function",
    function: {
      name: "get_weather",
      description: "Get the current weather for a location",
      parameters: {
        type: "object",
        properties: {
          location: {
            type: "string",
            description: "City name, e.g. San Francisco",
          },
        },
        required: ["location"],
      },
    },
  },
];

const response = await client.chat.completions.create({
  model: "accounts/fireworks/models/kimi-k2-instruct-0905",
  messages: [{ role: "user", content: "What's the weather in Paris?" }],
  tools: tools,
});

console.log(response.choices[0].message.tool_calls);

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({
  apiKey: process.env.FIREWORKS_API_KEY,
  baseURL: "https://api.fireworks.ai/inference",
});

const response = await client.messages.create({
  model: "accounts/fireworks/models/kimi-k2-instruct-0905",
  max_tokens: 1024,
  messages: [{ role: "user", content: "What's the weather in Paris?" }],
  tools: [
    {
      name: "get_weather",
      description: "Get the current weather for a location",
      input_schema: {
        type: "object",
        properties: {
          location: {
            type: "string",
            description: "City name, e.g. San Francisco",
          },
        },
        required: ["location"],
      },
    },
  ],
});

for (const block of response.content) {
  if (block.type === "tool_use") {
    console.log(`Tool: ${block.name}, Input:`, block.input);
  }
}

curl https://api.fireworks.ai/inference/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $FIREWORKS_API_KEY" \
  -d '{
    "model": "accounts/fireworks/models/kimi-k2-instruct-0905",
    "messages": [
      {
        "role": "user",
        "content": "What'\''s the weather in Paris?"
      }
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get the current weather for a location",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {
                "type": "string",
                "description": "City name, e.g. San Francisco"
              }
            },
            "required": ["location"]
          }
        }
      }
    ]
  }'

Learn more about function calling →

Structured outputs (JSON mode)

Get reliable JSON responses that match your schema:

from fireworks import Fireworks

client = Fireworks()

response = client.chat.completions.create(
  model="accounts/fireworks/models/deepseek-v3p1",
  messages=[
    {
      "role": "user",
      "content": "Extract the name and age from: John is 30 years old",
    }
  ],
  response_format={
    "type": "json_schema",
    "json_schema": {
      "name": "person",
      "schema": {
        "type": "object",
        "properties": {
          "name": { "type": "string" },
          "age": { "type": "number" }
        },
        "required": ["name", "age"],
      },
    },
  },
)

print(response.choices[0].message.content)

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ.get("FIREWORKS_API_KEY"),
    base_url="https://api.fireworks.ai/inference/v1",
)

response = client.chat.completions.create(
    model="accounts/fireworks/models/deepseek-v3p1",
    messages=[
        {
            "role": "user",
            "content": "Extract the name and age from: John is 30 years old",
        }
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "person",
            "schema": {
                "type": "object",
                "properties": {"name": {"type": "string"}, "age": {"type": "number"}},
                "required": ["name", "age"],
            },
        },
    },
)

print(response.choices[0].message.content)

import os
import anthropic

client = anthropic.Anthropic(
    api_key=os.environ.get("FIREWORKS_API_KEY"),
    base_url="https://api.fireworks.ai/inference"
)

response = client.messages.create(
    model="accounts/fireworks/models/deepseek-v3p1",
    max_tokens=1024,
    output_config={
        "format": {
            "type": "json_schema",
            "schema": {
                "type": "object",
                "properties": {
                    "name": { "type": "string" },
                    "age": { "type": "number" }
                },
                "required": ["name", "age"],
            },
        }
    },
    messages=[
        {
            "role": "user",
            "content": "Extract the name and age from: John is 30 years old",
        }
    ],
)

print(response.content[0].text)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.FIREWORKS_API_KEY,
  baseURL: "https://api.fireworks.ai/inference/v1",
});

const response = await client.chat.completions.create({
  model: "accounts/fireworks/models/deepseek-v3p1",
  messages: [
    {
      role: "user",
      content: "Extract the name and age from: John is 30 years old",
    },
  ],
  response_format: {
    type: "json_schema",
    json_schema: {
      name: "person",
      schema: {
        type: "object",
        properties: {
          name: { type: "string" },
          age: { type: "number" },
        },
        required: ["name", "age"],
      },
    },
  },
});

console.log(response.choices[0].message.content);

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({
  apiKey: process.env.FIREWORKS_API_KEY,
  baseURL: "https://api.fireworks.ai/inference",
});

const response = await client.messages.create({
  model: "accounts/fireworks/models/deepseek-v3p1",
  max_tokens: 1024,
  output_config: {
    format: {
      type: "json_schema",
      schema: {
        type: "object",
        properties: {
          name: { type: "string" },
          age: { type: "number" },
        },
        required: ["name", "age"],
      },
    },
  },
  messages: [
    {
      role: "user",
      content: "Extract the name and age from: John is 30 years old",
    },
  ],
});

console.log(response.content[0].text);

curl https://api.fireworks.ai/inference/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $FIREWORKS_API_KEY" \
  -d '{
    "model": "accounts/fireworks/models/deepseek-v3p1",
    "messages": [
      {
        "role": "user",
        "content": "Extract the name and age from: John is 30 years old"
      }
    ],
    "response_format": {
      "type": "json_schema",
      "json_schema": {
        "name": "person",
        "schema": {
          "type": "object",
          "properties": {
            "name": {
              "type": "string"
            },
            "age": {
              "type": "number"
            }
          },
          "required": ["name", "age"]
        }
      }
    }
  }'

Learn more about structured outputs →

Reasoning

Some models support reasoning, where the model shows its thought process before giving the final answer:

from fireworks import Fireworks

client = Fireworks()

response = client.chat.completions.create(
    model="accounts/fireworks/models/deepseek-v3p2",
    messages=[
        {"role": "user", "content": "What is 25 * 37? Show your work."}
    ],
    reasoning_effort="medium",
)

msg = response.choices[0].message
if msg.reasoning_content:
    print("Reasoning:", msg.reasoning_content)
print("Answer:", msg.content)

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ.get("FIREWORKS_API_KEY"),
    base_url="https://api.fireworks.ai/inference/v1",
)

response = client.chat.completions.create(
    model="accounts/fireworks/models/deepseek-v3p2",
    messages=[
        {"role": "user", "content": "What is 25 * 37? Show your work."}
    ],
    extra_body={"reasoning_effort": "medium"},
)

msg = response.choices[0].message
# Reasoning content is returned in a separate field
reasoning = getattr(msg, "reasoning_content", None)
if reasoning is None and hasattr(msg, "model_extra"):
    reasoning = msg.model_extra.get("reasoning_content")

if reasoning:
    print("Reasoning:", reasoning)
print("Answer:", msg.content)

The Anthropic SDK uses the thinking parameter to enable reasoning:

import os
import anthropic

client = anthropic.Anthropic(
    api_key=os.environ.get("FIREWORKS_API_KEY"),
    base_url="https://api.fireworks.ai/inference"
)

response = client.messages.create(
    model="accounts/fireworks/models/deepseek-v3p2",
    max_tokens=16000,
    thinking={"type": "enabled", "budget_tokens": 4096},
    messages=[
        {"role": "user", "content": "What is 25 * 37? Show your work."}
    ],
)

for block in response.content:
    if block.type == "thinking":
        print("Thinking:", block.thinking)
    elif block.type == "text":
        print("Answer:", block.text)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.FIREWORKS_API_KEY,
  baseURL: "https://api.fireworks.ai/inference/v1",
});

const response = await client.chat.completions.create({
  model: "accounts/fireworks/models/deepseek-v3p2",
  messages: [
    { role: "user", content: "What is 25 * 37? Show your work." },
  ],
  reasoning_effort: "medium",
});

const msg = response.choices[0].message;
if (msg.reasoning_content) {
  console.log("Reasoning:", msg.reasoning_content);
}
console.log("Answer:", msg.content);

The Anthropic SDK uses the thinking parameter to enable reasoning:

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({
  apiKey: process.env.FIREWORKS_API_KEY,
  baseURL: "https://api.fireworks.ai/inference",
});

const response = await client.messages.create({
  model: "accounts/fireworks/models/deepseek-v3p2",
  max_tokens: 16000,
  thinking: { type: "enabled", budget_tokens: 4096 },
  messages: [
    { role: "user", content: "What is 25 * 37? Show your work." },
  ],
});

for (const block of response.content) {
  if (block.type === "thinking") {
    console.log("Thinking:", block.thinking);
  } else if (block.type === "text") {
    console.log("Answer:", block.text);
  }
}

curl https://api.fireworks.ai/inference/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $FIREWORKS_API_KEY" \
  -d '{
    "model": "accounts/fireworks/models/deepseek-v3p2",
    "messages": [
      {
        "role": "user",
        "content": "What is 25 * 37? Show your work."
      }
    ],
    "reasoning_effort": "medium"
  }'

Learn more about reasoning →

Vision models

Analyze images with vision-language models:

from fireworks import Fireworks

client = Fireworks()

response = client.chat.completions.create(
  model="accounts/fireworks/models/qwen2p5-vl-32b-instruct",
  messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What's in this image?"},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://storage.googleapis.com/fireworks-public/image_assets/fireworks-ai-wordmark-color-dark.png"
          },
        },
      ],
    }
  ],
)

print(response.choices[0].message.content)

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ.get("FIREWORKS_API_KEY"),
    base_url="https://api.fireworks.ai/inference/v1"
)

response = client.chat.completions.create(
    model="accounts/fireworks/models/qwen2p5-vl-32b-instruct",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://storage.googleapis.com/fireworks-public/image_assets/fireworks-ai-wordmark-color-dark.png"
                    },
                },
            ],
        }
    ],
)

print(response.choices[0].message.content)

The Anthropic SDK uses its native image format with type: "image" and a source object:

import os
import anthropic

client = anthropic.Anthropic(
    api_key=os.environ.get("FIREWORKS_API_KEY"),
    base_url="https://api.fireworks.ai/inference"
)

response = client.messages.create(
    model="accounts/fireworks/models/qwen2p5-vl-32b-instruct",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "image",
                    "source": {
                        "type": "url",
                        "url": "https://storage.googleapis.com/fireworks-public/image_assets/fireworks-ai-wordmark-color-dark.png",
                    },
                },
            ],
        }
    ],
)

for block in response.content:
    if block.type == "text":
        print(block.text)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.FIREWORKS_API_KEY,
  baseURL: "https://api.fireworks.ai/inference/v1",
});

const response = await client.chat.completions.create({
  model: "accounts/fireworks/models/qwen2p5-vl-32b-instruct",
  messages: [
    {
      role: "user",
      content: [
        { type: "text", text: "What's in this image?" },
        {
          type: "image_url",
          image_url: {
            url: "https://storage.googleapis.com/fireworks-public/image_assets/fireworks-ai-wordmark-color-dark.png",
          },
        },
      ],
    },
  ],
});

console.log(response.choices[0].message.content);

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({
  apiKey: process.env.FIREWORKS_API_KEY,
  baseURL: "https://api.fireworks.ai/inference",
});

const response = await client.messages.create({
  model: "accounts/fireworks/models/qwen2p5-vl-32b-instruct",
  max_tokens: 1024,
  messages: [
    {
      role: "user",
      content: [
        { type: "text", text: "What's in this image?" },
        {
          type: "image",
          source: {
            type: "url",
            url: "https://storage.googleapis.com/fireworks-public/image_assets/fireworks-ai-wordmark-color-dark.png",
          },
        },
      ],
    },
  ],
});

for (const block of response.content) {
  if (block.type === "text") {
    console.log(block.text);
  }
}

curl https://api.fireworks.ai/inference/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $FIREWORKS_API_KEY" \
  -d '{
    "model": "accounts/fireworks/models/qwen2p5-vl-32b-instruct",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "What'\''s in this image?"
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "https://storage.googleapis.com/fireworks-public/image_assets/fireworks-ai-wordmark-color-dark.png"
            }
          }
        ]
      }
    ]
  }'

Learn more about vision models →

Serverless model lifecycle

Serverless models are managed by the Fireworks team and may be updated or deprecated as new models are released. We provide at least 2 weeks advance notice before removing any model, with longer notice periods for popular models based on usage. For production workloads requiring long-term model stability, we recommend using on-demand deployments, which give you full control over model versions and updates.

Make sure to add a payment method to access higher rate limits up to 6,000 RPM. Without a payment method, you’re limited to 10 RPM.

The 6,000 RPM figure is the maximum ceiling enforced by our spike arrest policy. Your actual limit scales dynamically with sustained usage, so short-lived spikes may be throttled below that cap. For predictable throughput needs, consider on-demand deployments or requesting a rate review.

Next steps

Ready to scale to production, explore other modalities, or customize your models?

Deploy and autoscale on Dedicated GPUs

Deploy with high performance on dedicated GPUs with fast autoscaling and minimal cold starts

Fine-tune Models

Improve model quality with supervised and reinforcement learning

Speech to Text

Real-time or batch audio transcription

Embeddings & Reranking

Use embeddings & reranking in search & context retrieval

Batch Inference

Run async inference jobs at scale, faster and cheaper

Browse 100+ Models

Explore all available models across modalities

API Reference

Complete API documentation

Build with Fireworks AI

Deployments Quickstart

⌘I