Loading tool…
Reference image + MiniMax narration · FAL Kling Avatar · YouTube 16:9
Upload a portrait or cartoon reference with speech audio to generate a lip-synced talking avatar video.
Powered by FAL Kling AI Avatar v2 — realistic, cartoon, and commentary styles with 16:9 landscape MP4 output.
Upload audio directly or pick from TTS history; results save automatically to My Creations.
FAL API key stays on the server. Credits are estimated before submit based on audio length.
Use a cartoon or virtual host instead of filming yourself.
MiniMax TTS audio plus Kling lip sync for narration videos.
16:9 reference images produce horizontal videos for YouTube explainers.
Latest FAL talking avatar model with natural lip sync.
Reuse MiniMax narration without re-uploading audio.
Realistic, cartoon, Laorou commentary, or custom prompt.
Auto-saved on completion with preview and download.
Use a 16:9 portrait or cartoon, or the Laorou preset.
Select from TTS history or upload MP3/WAV.
Choose realistic/cartoon/Laorou and submit.
Preview on page or in My Creations, download MP4.
Upload any audio directly, or optionally pick from Text to Speech history.
Best under 30 seconds per clip; max 120 seconds. Split longer scripts via TTS first.
Clear front-facing portrait or cartoon, 16:9 landscape, neutral expression.
Built-in commentary-style cartoon uncle image at public/presets/laorou-avatar.png — replace with your own.
From 25 credits plus duration-based add-on; Pro tier costs more. Estimate shown before submit.
Usually 1–5 minutes depending on audio length and FAL queue.
Automatically in My Creations under Talking Avatar.
Upload a reference image and speech audio to generate lip-sync video — upload any audio file, or optionally pick from TTS history
Use 16:9 landscape images for YouTube-ready output
MP3 / WAV / M4A supported — independent from TTS, use any narration file
Your talking video will appear here
After you submit, you'll go to My Creations to track progress and download when ready
My Creations