How to Stream Media Files from S3 Directly to AWS Lambda Using FFmpeg in Python
Processing media files inside AWS Lambda can be a challenge due to its resource limits, lack of a local disk, and timeouts. However, it’s entirely possible to stream files directly from S3 into FFmpeg, process them on the fly, and avoid writing anything to disk.
In this guide, we’ll cover:
- Streaming files from S3 into Lambda
- Using
ffmpeg
with stdin and stdout in memory - Avoiding
/tmp
bottlenecks - Examples and production-ready patterns
Why This Approach?
Traditional file processing in AWS Lambda often looks like:
- Download file from S3 to
/tmp
- Run FFmpeg on the file
- Upload result to S3
That works, but /tmp
is limited to 512 MB, and I/O is relatively slow. Streaming avoids this.
Benefits:
- Zero-disk usage (works in low-memory environments)
- Fully stateless function
- Faster, scalable media processing
Prerequisites
- Python 3.9+ (AWS Lambda compatible)
- FFmpeg binary packaged in your Lambda layer
boto3
for AWS SDKsubprocess
orasyncio.subprocess
- IAM role with S3 read/write permissions
🛠️ Step-by-Step Example
We’ll show how to:
- Stream an MP3 file from S3
- Trim the first 10 seconds using FFmpeg
- Output result as MP3
- Upload back to S3
1. Install FFmpeg to Lambda
You need a Lambda Layer with a statically compiled FFmpeg binary. You can use:
Make sure ffmpeg
is available at /opt/bin/ffmpeg
.
2. Lambda Handler Example (sync)
import boto3
import subprocess
import io
s3 = boto3.client('s3')
def lambda_handler(event, context):
input_bucket = 'your-source-bucket'
input_key = 'input.mp3'
output_bucket = 'your-destination-bucket'
output_key = 'trimmed.mp3'
# Get input file as stream
input_stream = s3.get_object(Bucket=input_bucket, Key=input_key)['Body']
# Use ffmpeg to trim 10 seconds
ffmpeg_cmd = [
"/opt/bin/ffmpeg",
"-i", "pipe:0", # stdin
"-ss", "00:00:00",
"-t", "00:00:10",
"-f", "mp3",
"pipe:1" # stdout
]
result = subprocess.run(
ffmpeg_cmd,
input=input_stream.read(),
stdout=subprocess.PIPE,
stderr=subprocess.PIPE
)
if result.returncode != 0:
print(result.stderr.decode())
raise Exception("FFmpeg failed")
# Upload result to S3
s3.upload_fileobj(io.BytesIO(result.stdout), output_bucket, output_key)
return {
'statusCode': 200,
'body': f"Processed and uploaded to {output_bucket}/{output_key}"
}
Notes and Best Practices
- Memory limits: use higher Lambda memory (512 MB–1 GB) for better CPU.
- Avoid loading entire file: For large files, consider chunked streaming (advanced).
- FFmpeg options: Customize codec (-c:a), bitrate (-b:a), and format (-f) as needed.
- Logs: Always log stderr from FFmpeg to catch conversion issues.
Advanced: Asyncio Streaming with asyncio.subprocess
If using Python 3.11+ with asyncio, you can:
- Stream bytes directly into FFmpeg
- Consume stdout in chunks
- Fully async Lambda setup (e.g. via async_lambda_handler)
See aiobotocore or async-ffmpeg for more ideas
Security Notes
- Ensure that S3 keys and buckets are validated (do not trust user input directly).
- Do not expose FFmpeg to external input without sanitization (avoid command injection).
Summary
Feature | Benefit |
---|---|
Streaming S3 → FFmpeg → S3 | Zero disk I/O |
FFmpeg stdin/stdout | Fast and memory-safe |
No /tmp writes | Works in constrained environments |
Fully serverless | Perfect for AWS Lambda microservices |
📚 Further Reading
- FFmpeg Documentation
- AWS Lambda Layers
- Lambda with External Binaries
- ffmpeg-python wrapper
- AWS S3 streaming via boto3
Final Thoughts
Streaming media files from S3 into FFmpeg inside Lambda is one of the cleanest ways to process content in modern cloud environments. Whether you’re trimming audio, extracting thumbnails from video, or re-encoding formats — this approach scales beautifully.