Skip to main content
Fireworks AI provides three ASR (Automatic Speech Recognition) features: Streaming Transcription, Pre-recorded Transcription, and Pre-recorded Translation. This guide shows you how to get started with each feature.

Streaming Transcription

Convert audio to text in real-time using WebSocket connections. Perfect for voice agents and live applications.

Quick Start

Available Models:
  • fireworks-asr-large: Cost efficient model for real-time transcription over web-sockets
  • fireworks-asr-v2: Next generation and ultra-low latency audio streaming for real-time transcription over web-sockets
For a working example of streaming transcription see the following resources:
  1. Python notebook
  2. Python cookbook
For more detailed information, see the full streaming API documentation and the source code

Pre-recorded Transcription

Convert audio files to text. Supports files up to 1GB in formats like MP3, FLAC, and WAV. Transcribe multiple hours of audio in minutes.

Quick Start

For a working example of pre-recorded transcription see the Python notebook Available Models:
  • whisper-v3: Highest accuracy
    • model=whisper-v3
    • base_url=https://audio-prod.us-virginia-1.direct.fireworks.ai
  • whisper-v3-turbo: Faster processing
    • model=whisper-v3-turbo
    • base_url=https://audio-turbo.us-virginia-1.direct.fireworks.ai
For more detailed information, see the full transcription API documentation

Pre-recorded Translation

Translate audio from any of our supported languages to English. Supports files up to 1GB in formats like MP3, FLAC, and WAV.

Quick Start

!pip install fireworks-ai requests

from fireworks.client.audio import AudioInference
import requests
import time
from dotenv import load_dotenv
import os

load_dotenv()

# Prepare client
audio = requests.get("https://tinyurl.com/3cy7x44v").content
client = AudioInference(
    model="whisper-v3",
    base_url="https://audio-prod.us-virginia-1.direct.fireworks.ai",
    #
    # Or for the turbo version
    # model="whisper-v3-turbo",
    # base_url="https://audio-turbo.us-virginia-1.direct.fireworks.ai",
    api_key=os.getenv("FIREWORKS_API_KEY")
)

# Make request
start = time.time()
r = await client.translate_async(audio=audio)
print(f"Took: {(time.time() - start):.3f}s. Text: '{r.text}'")
For more detailed information, see the full translation API documentation

Supported Languages

We support 95+ languages including English, Spanish, French, German, Chinese, Japanese, Russian, Portuguese, and many more. See the complete language list.

Common Use Cases

  • Call Center / Customer Service: Transcribe or translate customer calls
  • Note Taking: Transcribe audio for automated note taking

Next Steps

  1. Explore advanced features like speaker diarization and custom prompts
  2. Contact us at inquiries@fireworks.ai for dedicated endpoints and enterprise features
I