Skip to main content

Fireworks AI on Amazon ECS

What is Amazon ECS?

Amazon Elastic Container Service (ECS) is AWS’s fully managed container orchestration service that enables you to deploy, manage, and scale containerized applications. ECS integrates deeply with the AWS ecosystem, providing native support for GPU-accelerated workloads, Auto Scaling, Application Load Balancers, and CloudWatch monitoring.

Why Fireworks AI on ECS?

Enterprise Security & Compliance

Deploy Fireworks inference entirely within your own Virtual Private Cloud with VPC-native architecture, internal-only API endpoints, and data that never leaves your AWS environment. Meet strict regulatory requirements for healthcare, financial services, and government workloads while leveraging existing AWS Enterprise Discount Programs and reserved instances.

Performance at Scale

Access Fireworks’ full optimization stack including FireOptimizer, adaptive caching, and speculative decoding. Deploy on the latest GPU instance types (H100, H200, B200, etc) with automatic scaling to handle high throughput / low-latency workloads within the same VPC.

Main Deployment Steps

Deploying Fireworks AI on ECS is a three-phase process that balances automation with control: Phase 1: Foundation Infrastructure (Automated)
  • Run shell script to create S3 bucket, security groups, and IAM roles via Terraform
  • Establish secure foundation for your deployment
Phase 2: Resource Preparation (Manual)
  • Upload your model to the S3 bucket
  • Push Fireworks container image to Amazon ECR
  • Store metering key in AWS Secrets Manager
Phase 3: Cluster Deployment (Automated)
  • Run shell script to deploy ECS cluster, Application Load Balancer, and Auto Scaling Group
  • Results in production-ready internal inference endpoint

Detailed Deployment Guide

For complete step-by-step instructions, configuration options, troubleshooting guidance, and architecture details **contact Fireworks AI to request access to the full deployment documentation.
I