When to use completions
Use the completions API for:- Custom prompt templates with specific formatting requirements
- Base models (non-instruct/non-chat variants)
- Fine-grained control over token-level formatting
- Legacy applications that depend on raw completion format
Basic usage
- Python (Fireworks SDK)
- Python (OpenAI SDK)
- JavaScript
- curl
Most models automatically prepend the beginning-of-sequence (BOS) token (e.g.,
<s>) to your prompt. Verify this with the raw_output parameter if needed.Custom prompt templates
The completions API is useful when you need to implement custom prompt formats:Common parameters
All chat completions parameters work with completions:temperature- Control randomness (0-2)max_tokens- Limit output lengthtop_p,top_k,min_p- Sampling parametersstream- Stream responses token-by-tokenfrequency_penalty,presence_penalty- Reduce repetition
Querying deployments
Use completions with on-demand deployments by specifying the deployment identifier:Next steps
Chat Completions
Use chat completions for most use cases
Streaming
Stream responses for real-time UX
API Reference
Complete API documentation