Model Availability
Fireworks hosts several purpose-built embeddings models, which are optimized specifically for tasks like semantic search and document similarity comparison. We host the SOTA Qwen3 Embeddings family of models:fireworks/qwen3-embedding-8b
(*available on serverless)fireworks/qwen3-embedding-4b
fireworks/qwen3-embedding-0p6b
nomic-ai/nomic-embed-text-v1.5
nomic-ai/nomic-embed-text-v1
WhereIsAI/UAE-Large-V1
thenlper/gte-large
thenlper/gte-base
BAAI/bge-base-en-v1.5
BAAI/bge-small-en-v1.5
mixedbread-ai/mxbai-embed-large-v1
sentence-transformers/all-MiniLM-L6-v2
sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
fireworks/glm-4p5
fireworks/gpt-oss-20b
fireworks/kimi-k2-instruct-0905
fireworks/deepseek-r1-0528
Generating embeddings
Embeddings models take text as input and output a vector of floating point numbers to use for tasks like similarity comparisons and search. Our embedding service is OpenAI compatible. Refer to OpenAI’s embeddings guide and OpenAI’s embeddings documentation for more information on using these models.Python
dimensions
parameter to the request, for example, dimensions: 128
.
The API usage for embedding models is identical for BERT-based and LLM-based embeddings. Simply use the /v1/embeddings
endpoint with your chosen model.
Reranking documents
Reranking models are used to rerank a list of documents based on a query. We only support reranking with the Qwen3 Reranker family of models:fireworks/qwen3-reranker-8b
(*available on serverless)fireworks/qwen3-reranker-4b
fireworks/qwen3-reranker-0p6b
Python
Deploying embeddings and reranking models
While Qwen3 Embedding 8b and Qwen3 Reranker 8b are available on serverless, you also have the option to deploy them via on-demand deployments.We recommend passing
--load-targets default=0.4
to ensure proper autoscaling responsiveness for these deployments