Skip to main content

Documentation Index

Fetch the complete documentation index at: https://fireworks.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

Priority tier and Fast mode are in Preview. The features, pricing, and availability may change - we welcome your feedback!
Fireworks offers a Priority tier for workloads that require higher reliability, as well as a Fast mode for workloads that require higher speeds.

Priority tier

Priority tier is for workloads that require higher reliability during peak traffic periods, at a higher price point. Priority tier is prioritized above Standard traffic and is less likely to be rate limited. To use priority tier, set service_tier to "priority" (OpenAI-compatible chat completions only):
curl https://api.fireworks.ai/inference/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $FIREWORKS_API_KEY" \
  -d '{
    "model": "accounts/fireworks/models/kimi-k2p5",
    "service_tier": "priority",
    "messages": [{"role": "user", "content": "Hello"}]
  }'
Priority tier is available on select models. Models and pricing are listed on the Pricing page.

Fast mode

Fast mode is a high speed configuration, useful for interactive applications that require fast response speeds, at a higher price point. It is not a different model and the quality of the model remains the same. Fast mode is available for select models. To use Fast mode, change the model id as listed below.
Modelmodel id
Kimi K2.6 Turboaccounts/fireworks/routers/kimi-k2p6-turbo
GLM 5.1 Fastaccounts/fireworks/routers/glm-5p1-fast
curl https://api.fireworks.ai/inference/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $FIREWORKS_API_KEY" \
  -d '{
    "model": "accounts/fireworks/models/kimi-k2p6-turbo",
    "messages": [{"role": "user", "content": "Hello"}]
  }'
Pricing is listed on the Pricing page.