Skip to main content
This guide covers how to launch your RFT job using either the Eval Protocol CLI or the Fireworks web UI.

Prerequisites

Before launching an RFT job, ensure you have:
Your dataset must be in JSONL format with prompts (system and user messages). Each line represents one training example.Upload via CLI:
eval-protocol create dataset my-dataset --file dataset.jsonl
Or via the Fireworks dashboard.
Your reward function must be tested and uploaded. For local evaluators, upload via pytest:
cd evaluator_directory
pytest test_evaluator.py -vs
The test automatically registers your evaluator with Fireworks. For remote evaluators, deploy your HTTP service first.
Set your API key as an environment variable:
export FIREWORKS_API_KEY="fw_your_api_key_here"
Or store it in a .env file in your project directory.
Choose a base model that supports fine-tuning. Popular options:
  • accounts/fireworks/models/llama-v3p1-8b-instruct - Good balance of quality and speed
  • accounts/fireworks/models/qwen3-0p6b - Fast training for experimentation
  • accounts/fireworks/models/llama-v3p1-70b-instruct - Best quality, slower training
Check available models at fireworks.ai/models.
The Eval Protocol CLI provides the fastest, most reproducible way to launch RFT jobs.

Quick start

From your evaluator directory, run:
eval-protocol create rft \
  --base-model accounts/fireworks/models/llama-v3p1-8b-instruct
That’s it! This command automatically:
  1. Uploads your evaluator code (if not already uploaded)
  2. Uploads your dataset (if dataset.jsonl exists)
  3. Creates and launches the RFT job
The CLI automatically detects your evaluator and dataset from the current directory. No need to specify IDs manually.

Step-by-step walkthrough

1

Install Eval Protocol CLI

pip install eval-protocol
Verify installation:
eval-protocol --version
2

Set up authentication

Configure your Fireworks API key:
export FIREWORKS_API_KEY="fw_your_api_key_here"
Or create a .env file:
FIREWORKS_API_KEY=fw_your_api_key_here
3

Test your evaluator locally

Before training, verify your evaluator works:
cd evaluator_directory
pytest test_evaluator.py -vs
This runs your evaluator on test data and automatically registers it with Fireworks.
4

Create the RFT job

From your evaluator directory:
eval-protocol create rft \
  --base-model accounts/fireworks/models/llama-v3p1-8b-instruct \
  --output-model my-math-solver
The CLI will:
  • Upload evaluator code (if changed)
  • Upload dataset (if changed)
  • Create the RFT job
  • Display dashboard links for monitoring
Expected output:
Created Reinforcement Fine-tuning Job
   name: accounts/your-account/reinforcementFineTuningJobs/abc123

Dashboard Links:
   Evaluator: https://app.fireworks.ai/dashboard/evaluators/your-evaluator
   Dataset:   https://app.fireworks.ai/dashboard/datasets/your-dataset
   RFT Job:   https://app.fireworks.ai/dashboard/fine-tuning/reinforcement/abc123
5

Monitor training

Click the RFT Job link to watch training progress in real-time. See Monitor Training for details.

Common CLI options

Customize your RFT job with these flags: Model and output:
--base-model accounts/fireworks/models/llama-v3p1-8b-instruct  # Base model to fine-tune
--output-model my-custom-name                                   # Name for fine-tuned model
Training parameters:
--epochs 2                    # Number of training epochs (default: 1)
--learning-rate 5e-5          # Learning rate (default: 1e-4)
--lora-rank 16                # LoRA rank (default: 8)
--batch-size 65536            # Batch size in tokens (default: 32768)
Rollout (sampling) parameters:
--inference-temperature 0.8   # Sampling temperature (default: 0.7)
--inference-n 8               # Number of rollouts per prompt (default: 4)
--inference-max-tokens 4096   # Max tokens per response (default: 2048)
--inference-top-p 0.95        # Top-p sampling (default: 1.0)
--inference-top-k 50          # Top-k sampling (default: 40)
Remote environments:
--remote-server-url https://your-evaluator.example.com  # For remote rollout processing
Force re-upload:
--force                       # Re-upload evaluator even if unchanged
See all options:
eval-protocol create rft --help

Examples

Fast experimentation (small model, 1 epoch):
eval-protocol create rft \
  --base-model accounts/fireworks/models/qwen3-0p6b \
  --output-model quick-test
High-quality training (more rollouts, higher temperature):
eval-protocol create rft \
  --base-model accounts/fireworks/models/llama-v3p1-8b-instruct \
  --output-model high-quality-model \
  --inference-n 8 \
  --inference-temperature 1.0
Remote environment (for multi-turn agents):
eval-protocol create rft \
  --base-model accounts/fireworks/models/llama-v3p1-8b-instruct \
  --remote-server-url https://your-agent.example.com \
  --output-model remote-agent
Multiple epochs with custom learning rate:
eval-protocol create rft \
  --base-model accounts/fireworks/models/llama-v3p1-8b-instruct \
  --epochs 3 \
  --learning-rate 5e-5 \
  --output-model multi-epoch-model

Option B: Web UI

The Fireworks dashboard provides a visual interface for creating RFT jobs with guided parameter selection.
1

Navigate to Fine-Tuning

  1. Go to Fireworks Dashboard
  2. Click Fine-Tuning in the left sidebar
  3. Click Fine-tune a Model
Fine-tuning dashboard showing list of jobs
2

Select Reinforcement Fine-Tuning

  1. Choose Reinforcement as the tuning method
  2. Select your base model from the dropdown
The UI shows only models that support fine-tuning. Popular choices appear at the top.
Not sure which model to choose? Start with llama-v3p1-8b-instruct for a good balance of quality and speed.
3

Configure Dataset

  1. Upload new dataset or select existing from your account
  2. Preview dataset entries to verify format
  3. The UI validates your JSONL format automatically
Dataset selection interface
Each dataset row should have messages array:
{
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is 25 * 4?"}
  ]
}
4

Select Evaluator

  1. Choose from your uploaded evaluators
  2. Preview evaluator code and test results
  3. View recent evaluation metrics
If you haven’t uploaded an evaluator yet, you’ll need to do that first via CLI:
pytest test_evaluator.py -vs
For remote evaluators, you’ll enter your server URL in the environment configuration section.
5

Set Training Parameters

Configure how the model learns:Core parameters:
  • Output model name: Custom name for your fine-tuned model
  • Epochs: Number of passes through the dataset (start with 1)
  • Learning rate: How fast the model updates (use default 1e-4)
  • LoRA rank: Model capacity (8-16 for most tasks)
  • Batch size: Training throughput (use default 32k tokens)
The UI shows helpful tooltips for each parameter. See Parameter Tuning for detailed guidance.
6

Configure Rollout Parameters

Control how the model generates responses during training:
  • Temperature: Sampling randomness (0.7 for balanced exploration)
  • Top-p: Probability mass cutoff (0.9-1.0)
  • Top-k: Token candidate limit (40 is standard)
  • Number of rollouts (n): Responses per prompt (4-8 recommended)
  • Max tokens: Maximum response length (2048 default)
Higher temperature and more rollouts increase exploration but also cost.
7

Review and Launch

  1. Review all settings in the summary panel
  2. See estimated training time and cost
  3. Click Start Fine-Tuning to launch
The dashboard will redirect you to the job monitoring page where you can track progress in real-time.

UI vs CLI comparison

FeatureCLI (eval-protocol)Web UI
SpeedFast - single commandSlower - multiple steps
AutomationEasy to script and reproduceManual process
Parameter discoveryNeed to know flag namesGuided with tooltips
Batch operationsEasy to launch multiple jobsOne at a time
ReproducibilityExcellent - save commandsManual tracking needed
Best forExperienced users, automationFirst-time users, exploration
Start with the UI to learn the options, then switch to CLI for faster iteration and automation.

Using firectl CLI (Alternative)

For users already familiar with Fireworks firectl, you can create RFT jobs directly:
firectl create rftj \
  --base-model accounts/fireworks/models/llama-v3p1-8b-instruct \
  --dataset accounts/your-account/datasets/my-dataset \
  --evaluator accounts/your-account/evaluators/my-evaluator \
  --output-model my-finetuned-model
Differences from eval-protocol:
  • Requires fully qualified resource names (accounts/…)
  • Must manually upload evaluators and datasets first
  • More verbose but offers finer control
  • Same underlying API as eval-protocol
See firectl documentation for all options.

Job validation

Before starting training, Fireworks validates your configuration:
  • ✅ Valid JSONL format
  • ✅ Each line has messages array
  • ✅ Messages have role and content fields
  • ✅ File size within limits
  • ❌ Missing fields → error with specific line numbers
  • ❌ Invalid JSON → syntax error details
  • ✅ Evaluator code syntax is valid
  • ✅ Required dependencies are available
  • ✅ Entry point function exists
  • ✅ Test runs completed successfully
  • ❌ Import errors → missing dependencies
  • ❌ Syntax errors → code issues
  • ✅ Sufficient GPU quota
  • ✅ Base model supports fine-tuning
  • ✅ Account has RFT permissions
  • ❌ Insufficient quota → request increase
  • ❌ Invalid model → choose different base model
  • ✅ Parameters within valid ranges
  • ✅ Compatible parameter combinations
  • ❌ Invalid ranges → error with allowed values
  • ❌ Conflicting options → resolution guidance
If validation fails, you’ll receive specific error messages with instructions to fix the issues.

Common errors and fixes

Error: Dataset validation failed: invalid JSON on line 42Fix:
  1. Open your JSONL file
  2. Check line 42 for JSON syntax errors
  3. Common issues: missing quotes, trailing commas, unescaped characters
  4. Validate JSON at jsonlint.com
Error: Missing required field 'messages'Fix: Each dataset row must have a messages array:
{"messages": [{"role": "user", "content": "..."}]}
Error: Evaluator 'my-evaluator' not found in accountFix:
  1. Upload your evaluator first:
    cd evaluator_directory
    pytest test_evaluator.py -vs
    
  2. Or specify evaluator ID if using UI:
Error: Insufficient GPU quota for this jobFix:
  1. Check your current quota at Account Settings
  2. Request a quota increase through the dashboard
  3. Or choose a smaller base model to reduce GPU requirements
Error: Learning rate 1e-2 outside valid range [1e-5, 5e-4]Fix: Adjust the parameter to be within the allowed range:
--learning-rate 1e-4  # Use default value
See Parameter Reference for all valid ranges.
Error: Evaluator build timed out after 10 minutesFix:
  1. Check build logs in Evaluators dashboard
  2. Common issues:
    • Large dependencies taking too long to install
    • Network issues downloading packages
    • Syntax errors in requirements.txt
  3. Wait for build to complete, then run create rft again
  4. Consider splitting large dependencies or using lighter alternatives

What happens after launching

Once your job is created, here’s what happens:
1

Job queued

Your job enters the queue and waits for available GPU resources. Queue time depends on current demand.Status: PENDING
2

Dataset validation

Fireworks validates your dataset to ensure it meets format requirements and quality standards. This typically takes 1-2 minutes.Status: VALIDATING
3

Training starts

The system begins generating rollouts, evaluating them, and updating model weights. You’ll see:
  • Rollout generation and evaluation
  • Reward curves updating in real-time
  • Training loss decreasing
Status: RUNNING
4

Monitor progress

Track training via the dashboard. See Monitor Training for details on interpreting metrics and debugging issues.Status: RUNNINGCOMPLETED
5

Job completes

When training finishes, your fine-tuned model is ready for deployment.Status: COMPLETEDNext: Deploy your model for inference.

Advanced configuration

Track training metrics in W&B for deeper analysis:
eval-protocol create rft \
  --base-model accounts/fireworks/models/llama-v3p1-8b-instruct \
  --wandb-project my-rft-experiments \
  --wandb-entity my-org
Set WANDB_API_KEY in your environment first.
Save intermediate checkpoints during training:
firectl create rftj \
  --base-model accounts/fireworks/models/llama-v3p1-8b-instruct \
  --checkpoint-frequency 500  # Save every 500 steps
  ...
Available in firectl only.
Speed up training with multiple GPUs:
firectl create rftj \
  --base-model accounts/fireworks/models/llama-v3p1-70b-instruct \
  --accelerator-count 4  # Use 4 GPUs
  ...
Recommended for large models (70B+).
For evaluators that need more time:
firectl create rftj \
  --rollout-timeout 300  # 5 minutes per rollout
  ...
Default is 60 seconds. Increase for complex evaluations.

Next steps