AI Avatar API
Generate lifelike AI avatars and talking-head videos from text or image inputs. One API for face synthesis, lip-sync, and custom digital human creation — unified key, unified billing.
Overview
What the AI Avatar API does
AI avatar generation turns a face image and a text script into a video of that person speaking — with accurate lip-sync, natural head movement, and optional emotion control. It's the infrastructure behind spokesperson videos, personalized video messages, and AI presenter tools.
What the AI Avatar API does
AI avatar generation turns a face image and a text script into a video of that person speaking — with accurate lip-sync, natural head movement, and optional emotion control. It's the infrastructure behind spokesperson videos, personalized video messages, and AI presenter tools.
Building this directly against individual providers means managing separate auth flows, output formats, watermark handling, and compliance rules for real-person generation. ImaRouter consolidates the integration: you pass a face reference and a script, we route to the right model and return the video.
- Face image + text script → talking-head video in one API call
- Lip-sync accuracy across English, Chinese, Spanish, and more
- Emotion and expression control on supported models
- Built-in compliance review layer for real-person video generation
- Output: MP4 via CDN URL, consistent across all models
Supported models and generation types
ImaRouter integrates the leading digital human and avatar generation models. Each has a distinct profile for realism, speed, supported languages, and customization depth.
- HeyGen Studio — High-realism avatar video, deep customization, enterprise-grade quality
- Vidu Portrait — Fast portrait animation, good for social and UGC use cases
- D-ID Creative Reality — Scalable talking-head generation, strong multilingual support
- SadTalker — Open-weight face animation, image-driven with fine head pose control
- MuseTalk — Real-time lip-sync for live streaming and interactive applications
Capabilities
What you can build
AI avatar generation unlocks product patterns that previously required expensive video production or on-camera talent. These are the integration patterns teams are using in production.
What you can build
AI avatar generation unlocks product patterns that previously required expensive video production or on-camera talent. These are the integration patterns teams are using in production.
- Personalized video outreach: generate unique spokesperson videos at scale for sales or marketing
- E-learning content: produce instructor-led explainer videos from text scripts without studio time
- Multilingual video localization: re-lip-sync existing video to a new language without re-recording
- AI presenter tools: build a SaaS product where users create their own avatar video with one click
- Customer support video: generate personalized video responses at support ticket scale
- Brand ambassador content: create on-brand spokesperson clips from a single brand face asset
Real-person compliance layer
Generating video of real people requires content moderation to prevent misuse. ImaRouter routes real-person avatar requests through a human review layer before delivery — the same system used for Seedance real-person video generation.
Pass review_required: true in your request to enable the compliance flow. The review adds 1–4 hours to delivery time and ensures output meets commercial use standards. All reviewed outputs are logged with audit metadata.
- Set review_required: true to route through the human review layer
- Review SLA: 1–4 hours for standard requests, 30 min for priority review
- Audit log: every reviewed job is stored with reviewer decision and timestamp
- Non-compliant outputs are rejected with a structured reason code
Getting started
Submit a POST request to /v1/avatar/generate with your face image URL, script text, model preference, and output language. Use your ImaRouter API key. Avatar generation is asynchronous — you receive a job ID and poll for the completed video.
If you need real-person compliance review, add review_required: true to your request. The completed job response includes the video URL, model used, generation time, and cost in USD.
- POST /v1/avatar/generate — async, returns {jobId, status: 'queued'}
- Required params: face_image_url, script, model, language
- GET /v1/jobs/{jobId} — poll for status: queued → processing → completed
- Webhook: set callback_url to receive the completed video without polling
FAQ
FAQ
Can I use my own face image for avatar generation?
Yes. Pass any face image URL in the face_image_url field. The image should be a clear front-facing photo with good lighting. For best results, use a 512×512 or larger image with the face occupying at least 50% of the frame.
What languages are supported for lip-sync?
Language support varies by model. HeyGen Studio and D-ID support 40+ languages including English, Chinese (Mandarin and Cantonese), Spanish, French, German, Japanese, Korean, and Arabic. MuseTalk focuses on English and Chinese with real-time performance.
How does the compliance review work for real-person video?
When review_required: true is set, ImaRouter routes your job to a human review queue before delivering the output. A reviewer checks the output against commercial content standards and either approves delivery or rejects with a reason code. This process typically takes 1–4 hours.
What is the output resolution and format?
Avatar videos are delivered as MP4 files via CDN URL in the job completion response. Standard resolution is 720p. 1080p is available on HeyGen Studio and D-ID models. Output files are retained for 72 hours after generation — download and store before the window closes.
Can I create a reusable avatar that I don't need to re-upload each time?
Yes. After your first generation, you can save the face asset to your ImaRouter account and reference it by asset ID in future requests. Saved assets are available for 90 days and can be re-uploaded at any time.
Launch paths