WeightSyncer and DeploymentManager expose.
Fireworks offers three paths for reinforcement learning, along a spectrum that trades convenience for control:
- BYOT rollouts — the most hands-on, for teams already running their own trainer who want Fireworks purely for high-scale rollout inference.
- Training API — the middle ground, for teams who want to own their training logic without managing infrastructure.
- Managed RFT — the turnkey path, for teams who just want a tuned model without running the loop themselves.
Where this guide fits
| Path | You own | Fireworks owns | Use this guide? |
|---|---|---|---|
| This guide (BYOT rollouts) | Trainer, rewards, environment, checkpoint upload cadence | Hot-load deployment, distributed weight swap, inference, KV cache across rollouts | Yes |
| Training API | Training logic (recipes or SDK) | GPUs, trainer lifecycle, often FW_HOSTED bucket | Only if you need rollout behavior beyond what the SDK exposes |
| Managed RFT | Dataset and evaluator | End-to-end hosted RL | No |
- Disaggregated: Your trainer and rollout cluster can run in different regions or clouds; deployments can span multiple regions to pool capacity.
- Full-parameter or LoRA: Hot-load full-parameter checkpoints for large models, or hot-load a LoRA adapter on top of a frozen base model—both run on Fireworks inference shapes. See LoRA rollouts.
- Fast checkpoint transfer: Lossless compressed incremental snapshots (
arc_v2, typically 20×+ compression) over standard object storage—no special RDMA networking between trainer and inference. - Async / off-policy friendly: Background download during rollouts; configurable swap semantics similar in spirit to PipelineRL—see checkpoint-swap behavior.
Supported models
BYOT rollout hot-load is enabled for a curated set of base models from our full model library. The following are supported today:| Model | Base model ID |
|---|---|
| Kimi K2.5 | kimi-k2p5 |
| Kimi K2.6 | kimi-k2p6 |
| Kimi K2.7 | kimi-k2p7-code |
| GLM 5.2 | glm-5p2 |
| Qwen3 30B-A3B | qwen3-30b-a3b-instruct-2507 |
Currently, reach out to the Fireworks RL team to get set up. We will help provision the deployment shape (GPU, precision, quantization) for your base model and confirm the snapshot format it expects.
Placeholders
Reuse these values in every command below:| Placeholder | Example |
|---|---|
<account_id> | my-team |
<model_id> | qwen3-30b-a3b |
<deployment_id> | rl-rollout-prod |
<fireworks_api_key> | From API keys |
<your_bucket> / <your_upload_path> | Parent prefix configured on the deployment (no trailing slash) |
<checkpoint_id> | Snapshot directory name, e.g. version_001 (no slashes) |
Prerequisites
Complete this checklist before creating a deployment:- Fireworks account and API key — create a key and set
export FIREWORKS_API_KEY="<key>". - Account ID — In the dashboard, open your account settings or any resource URL; the account slug is the segment after
/accounts/(for exampleaccounts/<account_id>/...). - Feature enablement — Request external-bucket hot-load for RL rollouts on account
<account_id>, including your bucket provider (S3,GCS/gs://, orNEBIUS). - Object storage read access for Fireworks — Fireworks needs read-only access to the bucket prefix you will pass as
--hot-load-bucket-url. At enablement, Fireworks shares the IAM principal to grant access. Typical setup:- Amazon S3: Grant the Fireworks principal
s3:GetObject(ands3:ListBucketon the prefix) ons3://<your_bucket>/<your_upload_path>/*. - Google Cloud Storage: Grant
roles/storage.objectVieweron the bucket or prefix to the Fireworks service account provided at onboarding. - Nebius / MinIO: Equivalent read-only credentials or access key scoped to the upload prefix.
- Amazon S3: Grant the Fireworks principal
firectlinstalled — See firectl.- Base model and deployment shape — An RL-capable shape for your model (GPU count, precision). If you omit
--deployment-shape,firectlprompts you to pick one interactively.
Architecture
You own: trainer, reward shaping, checkpoint cadence, rollout orchestration. Fireworks owns: hot-load logistics, distributed weight swap, inference serving, KV cache across rollouts.End-to-end loop
- Create a hot-load deployment.
- Upload and hot-load an initial full snapshot.
- Run rollouts against that snapshot.
- For each training step: upload and hot-load the next incremental snapshot (see Incremental snapshots).
- Run rollouts again.
- Every 20th or 30th step, publish a full snapshot instead of an incremental one. If the incremental chain fails, fall back to a full snapshot.
Quickstart checklist
Use this table for your first rollout end-to-end:| Step | Action | Done when |
|---|---|---|
| 1 | Create hot-load deployment | firectl deployment get <deployment_id> shows a healthy deployment |
| 2 | Upload full HF snapshot | All files exist under .../<checkpoint_id>/ in object storage |
| 3 | POST signal snapshot | HTTP 200 |
| 4 | GET poll status | Every replica has readiness: true and current_snapshot_identity matches your identity |
| 5 | Run rollouts | Chat/completions returns tokens |
1. Create a hot-load deployment
Create the deployment that will serve rollouts. During preview,--enable-hot-load flags may be hidden from CLI help but can still be passed explicitly.
Flags
--deployment-shape— Optional. If omitted,firectlprompts you to pick one.--hot-load-bucket-type—MINIO,S3,NEBIUS, orFW_HOSTED. This guide focuses on external buckets (S3,gs://, etc.).FW_HOSTEDis for Fireworks-managed trainers.--hot-load-bucket-url— Required when--enable-hot-loadis set. Examples:s3://mybucket/path,gs://mybucket/path. No trailing slash. This is the parent prefix; each snapshot is a subdirectory named byidentity(see snapshot layout).--hot-load-transition-type—ASYNC(recommended for RL) orSYNC. Defaults toASYNCwhen hot load is enabled. See checkpoint-swap behavior.--region— Where the deployment runs (for exampleUS_OHIO_1,US_VIRGINIA_1). Keep the trainer upload path geographically close to the bucket and deployment.

2. Upload and hot-load an initial full snapshot
Upload a full HuggingFace-format checkpoint, then signal Fireworks to load it.Snapshot layout
Place each snapshot under its own subdirectory. Theidentity you signal in the API must match the directory name (a single path segment—no slashes):
identity/<checkpoint_id>— Any opaque string (for exampleversion_001orstep_00100).- Format — Same layout as the base model on HuggingFace, plus the two manifest files described below. No tensor-parallel sharding in uploaded files.
- File size — Split weights into multiple
.safetensorsfiles, each under about 5 GB. One layer per file is required (a single shard must not mix weights from more than one layer) and also minimizes load time.
Required files (full snapshot)
A full (non-LoRA) snapshot is validated at POST time; it must contain all of:| File | Purpose | Validation |
|---|---|---|
config.json | HuggingFace model config. | Must be loadable as an AutoConfig and equivalent to the base model’s config (hidden_size, num_hidden_layers, rope_parameters, etc.). A quantization_config key is allowed for quantized snapshots. |
model.safetensors.index.json | Maps each tensor name to the shard file that stores it (weight_map). | Must be a JSON object with a weight_map; every shard file may contain weights from only one layer. |
model.weight.spec.json | Describes each tensor’s shape and dtype (tensor_map). | Must be a JSON object with a tensor_map that covers every weight named in weight_map. |
model-*.safetensors | The weights themselves. | Tensor coverage must match the index/spec; tensors must cover the base model. |
| tokenizer files | tokenizer.json, tokenizer_config.json, etc. | Carried over from the base model. |
model.safetensors.index.json (HuggingFace-standard) maps tensors to shards:
model.weight.spec.json gives each tensor’s metadata (used to validate transitions and dequantization):
Incremental snapshots
An incremental snapshot is an ARC2-compressed delta of the safetensors against aprevious_snapshot_identity already on the deployment. It keeps the same model.safetensors.index.json as its parent — the weight_map, the file count, and the per-file weight set must be identical (only the tensor contents change). Tensor dtypes must also match across the transition. Upload only the diff .safetensors (plus the unchanged manifests/config) under the new identity; signal it with incremental_snapshot_metadata. See Incremental snapshots for the full body and the delta-build utilities.
Optional: call the per-file hint API as each file lands to speed up loading on large models.
Quantized snapshots
Snapshot precision must be in a format the shape’s loader accepts, so the exact target format is shape-dependent — confirm it with the Fireworks RL team during shape provisioning (see Supported models). You can upload weights in the base precision (BF16) and let the shape convert them at load time, or pre-quantize in the snapshot to cut upload size and weight-swap time. For large MoE models such as GLM and Kimi, the routed MoE expert weights can be pre-quantized to FP8 in the uploaded snapshot. When you pre-quantize:- Add the matching
quantization_configblock toconfig.json. - Make sure
model.weight.spec.json(tensor_map) describes the quantized tensors (shape+dtype); the snapshot is rejected if aquantization_configis present but the spec has no dequantizable tensors or is missing the metadata needed to dequantize them. - Keep the quantization recipe consistent across every snapshot in a run so the incremental chain stays valid (dtypes must not change between a snapshot and its parent).
Signal and poll
Use the Hot-load API below with{ "identity": "<checkpoint_id>" } and poll until all replicas are ready.
Hot-load API
All hot-load requests use these headers:| Header | Value |
|---|---|
Authorization | Bearer <fireworks_api_key> |
fireworks-model | accounts/<account_id>/models/<model_id> |
fireworks-deployment | accounts/<account_id>/deployments/<deployment_id> |
Content-Type | application/json |
| Operation | Method | URL |
|---|---|---|
| Signal snapshot ready | POST | https://api.fireworks.ai/hot_load/v1/models/hot_load |
| Poll load status | GET | https://api.fireworks.ai/hot_load/v1/models/hot_load |
| Per-file hint (optional) | POST | https://api.fireworks.ai/hot_load/v1/models/hot_load/hint |
Signal snapshot ready
Full snapshot body:checksum_format are documented in Incremental snapshots.
Snapshot directory name under the configured bucket prefix. Must not contain
/.Required for incremental snapshots. Includes
previous_snapshot_identity, compression_format (arc_v2), and checksum_format (alder32). See the incremental snapshots guide.Prompt-cache policy after the swap:
all (default), none, or new_session. See KV cache behavior for RL rollouts for active stream, session ID, and reset-option semantics.Top-level
config.json fields to ignore during snapshot validation. Only use for known-safe metadata fields.Poll load status
Poll until every replica hasreadiness: true and current_snapshot_identity equals the identity you signaled.
When to start rollouts
- Default (on-policy): Wait until all replicas report readiness on the new
identity. - Off-policy / higher utilization: You may start sending rollouts when a subset of replicas is ready—inspect each entry in
replicasin theGETresponse. Stale-policy rollouts are expected; use async transition mode and monitor policy version in streaming responses (see Policy version in responses).
3. Run rollouts
Call the OpenAI-compatible inference API. For multi-turn RL, set session headers so KV cache stays on one replica:Steady-state training loop
After the first full snapshot:- Intermediate steps — Build and upload an incremental snapshot (
arc_v2), signal withincremental_snapshot_metadata, poll until ready, then run rollouts. - Every 20th or 30th step — Publish a new full snapshot for faster recovery and chain reset.
- On failure — Fall back to a full snapshot; see Ledger & debugging.
LoRA rollouts
Rollouts work with both full-parameter and LoRA checkpoints. With LoRA you hot-load only the adapter on top of a frozen base model: snapshots are tiny (tens of MB instead of tens of GB), weight swaps are near-instant, and the deployment applies the adapter at request time. This is a good fit for rapid RL iteration and for serving several adapter variants from one base deployment. The flow is the same as the end-to-end loop above—create a hot-load deployment, upload a snapshot, signal it, poll, and run rollouts—with the differences below.LoRA rollouts run on a LoRA-capable RL/hot-load shape (adapter serving enabled on the base-model deployment). Confirm the shape for your base model with Fireworks during feature enablement.
Auto-detection
You do not set a flag to choose LoRA vs full-parameter. Fireworks classifies each snapshot from its contents: a directory containingadapter_config.json is loaded as a LoRA adapter; anything else is treated as a full-parameter snapshot. The Hot-load API is identical for both.
LoRA snapshot layout
Upload a HuggingFace / PEFT-format adapter under the snapshotidentity directory (same bucket parent as the full snapshot layout):
adapter_config.json— PEFT adapter config. Its presence is what marks the snapshot as LoRA; it must reference the same base model the deployment serves.adapter_model.safetensors— adapter weights. Shardedadapter_model-*.safetensorsand the legacyadapter_model.binare also accepted.
Signal and poll
Signal exactly like a full snapshot—just theidentity:
loaded_adapters array (each entry has an identity and a status) in addition to the single current_snapshot_identity used for full-parameter snapshots. Treat the snapshot as ready when your identity appears with status: "loaded" on every replica.
No incremental chain
LoRA adapters are small, so there is no ARC2 incremental/delta chain for LoRA. Upload the full adapter every step—each LoRA snapshot is complete and self-contained. The incremental snapshot workflow (and the “every 20th–30th step, publish a full snapshot” cadence) applies only to full-parameter checkpoints.Numerics alignment
For best training–inference alignment:- Match quantization / precision between trainer checkpoints and the deployment shape (work with Fireworks if you need a custom shape).
- Measure logprob divergence between trainer forward passes and rollout inference on the same tokens.
- For MoE models, use Router Replay (R3) during rollouts—see MoE Router Replay.
Next steps
Incremental snapshots
Build ARC2 deltas, per-file hints, and incremental signal bodies.
Ledger & debugging
Inspect snapshot history, reset the ledger, and reason about request behavior during weight swaps.
Inference for RL rollouts
Session affinity headers, policy version in streams, weight-swap behavior, and MoE Router Replay (R3).
Fireworks-hosted trainer
The alternative path where Fireworks runs the trainer through the Training API.