Speech to Text

Fireworks AI provides three ASR (Automatic Speech Recognition) features: Streaming Transcription, Pre-recorded Transcription, and Pre-recorded Translation. This guide shows you how to get started with each feature.

Streaming Transcription

Convert audio to text in real-time using WebSocket connections. Perfect for voice agents and live applications.

Quick Start

Available Models:

fireworks-asr-large: Cost efficient model for real-time transcription over web-sockets
fireworks-asr-v2: Next generation and ultra-low latency audio streaming for real-time transcription over web-sockets

For a working example of streaming transcription see the following resources:

For more detailed information, see the full streaming API documentation and the source code

Pre-recorded Transcription

Convert audio files to text. Supports files up to 1GB in formats like MP3, FLAC, and WAV. Transcribe multiple hours of audio in minutes.

Quick Start

For a working example of pre-recorded transcription see the Python notebook Available Models:

whisper-v3: Highest accuracy
- model=whisper-v3
- base_url=https://audio-prod.api.fireworks.ai
whisper-v3-turbo: Faster processing
- model=whisper-v3-turbo
- base_url=https://audio-turbo.api.fireworks.ai

For more detailed information, see the full transcription API documentation

Pre-recorded Translation

Translate audio from any of our supported languages to English. Supports files up to 1GB in formats like MP3, FLAC, and WAV.

Quick Start

!pip install fireworks-ai requests

from fireworks.client.audio import AudioInference
import requests
import time
from dotenv import load_dotenv
import os

load_dotenv()

# Prepare client
audio = requests.get("https://tinyurl.com/3cy7x44v").content
client = AudioInference(
    model="whisper-v3",
    base_url="https://audio-prod.api.fireworks.ai",
    #
    # Or for the turbo version
    # model="whisper-v3-turbo",
    # base_url="https://audio-turbo.api.fireworks.ai",
    api_key=os.getenv("FIREWORKS_API_KEY")
)

# Make request
start = time.time()
r = await client.translate_async(audio=audio)
print(f"Took: {(time.time() - start):.3f}s. Text: '{r.text}'")

For more detailed information, see the full translation API documentation

Supported Languages

We support 95+ languages including English, Spanish, French, German, Chinese, Japanese, Russian, Portuguese, and many more. See the complete language list.

Common Use Cases

Call Center / Customer Service: Transcribe or translate customer calls
Note Taking: Transcribe audio for automated note taking

Next Steps

Explore advanced features like speaker diarization and custom prompts
Contact us at [email protected] for dedicated endpoints and enterprise features

Get Started

Deployments

Models & Inference

Fine Tuning

Administration

Security & Compliance

Integrations

Streaming Transcription

Quick Start

Pre-recorded Transcription

Quick Start

Pre-recorded Translation

Quick Start

Supported Languages

Common Use Cases

Next Steps

Get Started

Deployments

Models & Inference

Fine Tuning

Administration

Security & Compliance

Integrations

​Streaming Transcription

​Quick Start

​Pre-recorded Transcription

​Quick Start

​Pre-recorded Translation

​Quick Start

​Supported Languages

​Common Use Cases

​Next Steps

Streaming Transcription

Quick Start

Pre-recorded Transcription

Quick Start

Pre-recorded Translation

Quick Start

Supported Languages

Common Use Cases

Next Steps