
Alibaba's most advanced open-architecture video model. Upload a 5-second reference video to clone character appearance and voice, auto-plan multi-shot narratives from a single prompt, and generate up to 15 seconds of 1080p video with native audio synchronization — powered by a 14B-parameter Mixture-of-Experts architecture trained on 1.5 billion videos.
Wan 2.6 introduces Reference-to-Video and Multi-Shot capabilities built on a Mixture-of-Experts architecture, delivering character-consistent storytelling with native audio synchronization.
Upload a 5-second reference video with a character's appearance and voice. Wan 2.6 extracts identity features and generates new scenes starring that character — maintaining facial features, clothing, mannerisms, and vocal tone across entirely different scenarios. China's first production-ready R2V implementation.
Provide a simple narrative prompt and Wan 2.6 automatically plans multiple shots — camera angles, scene transitions, and pacing — to tell a coherent story. Characters and environments stay consistent across all shots without manual intervention.
Generate videos with synchronized dialogue, sound effects, and ambient audio. Wan 2.6's lip-sync engine matches mouth movements to speech in real time, supporting multi-character dialogue scenes with distinct voices for each speaker.
Built on a Mixture-of-Experts design with 14 billion active parameters (27 billion total), trained on 1.5 billion videos and 10 billion images. This architecture delivers leading benchmark results on VBench while maintaining the fastest time-to-first-frame among comparable models.
Specifications
Wan 2.6 supports multiple generation modes — Text-to-Video, Image-to-Video, Reference-to-Video, and Multi-Shot — with flexible resolution and duration options.
Architecture
MoE (14B / 27B)
Max Resolution
1080p
Frame Rate
24 fps
Aspect Ratios
16:9 · 9:16 · 1:1 · 4:3 · 3:4
Duration Range
5s / 10s / 15s
Generation Modes
T2V / I2V / R2V / Multi-Shot
Native Audio
Lip-Sync + SFX + Ambient
Training Data
1.5B Videos + 10B Images
Reference Input
5s Video + Voice
Standard resolution for rapid prototyping, social content, and cost-efficient batch generation.
Full HD for production-grade output, cinematic quality, and professional delivery.
From character-driven narratives to social content, Wan 2.6's R2V and Multi-Shot capabilities unlock new creative workflows.
Use R2V to establish a character from a 5-second reference, then Multi-Shot to auto-plan a complete narrative arc. Consistent character identity across all scenes without manual editing.
Create a digital spokesperson from a reference video. Generate unlimited product presentations, announcements, and tutorials starring the same virtual character with matching voice and mannerisms.
Produce episodic vertical content (9:16) for TikTok, Reels, and Shorts. Multi-Shot auto-plans engaging scene sequences from simple prompts, with 5 aspect ratios for every platform.
Generate cinematic product videos with synchronized voiceover and sound effects. Native audio sync eliminates post-production audio editing for quick commercial turnarounds.
Build educational content up to 15 seconds per clip with clear narration. Character consistency via R2V makes recurring instructor characters possible across entire course series.
Upload audio references to guide the visual rhythm. Wan 2.6's audio sync engine matches visual beats to music, while I2V and R2V maintain artistic style throughout the video.
Comparison
See how Wan 2.6 compares to its predecessor and leading commercial models across key capabilities.
Feature | Wan 2.6 | Wan 2.5 | Seedance 2.0 | Veo 3.1 | |
|---|---|---|---|---|---|
| Max Resolution | 1080p | 1080p | 2K | 4K | |
| Max Duration | 15s | 10s | 15s | 8s (extendable) | |
| Aspect Ratios | 5 (16:9, 9:16, 1:1, 4:3, 3:4) | 3 (16:9, 9:16, 1:1) | 5 | 2 (16:9, 9:16) | |
| Reference-to-Video | ✓ (5s video) | Up to 3 videos | |||
| Multi-Shot | ✓ (Auto-plan) | ||||
| Native Audio Sync | ✓ (Lip-sync) | ||||
| Multi-Character Dialogue | |||||
| Image-to-Video | Up to 9 images | Up to 3 images | |||
| Architecture | MoE 14B/27B | Dense 14B | Proprietary | Proprietary | |
| Open Source Base | ✓ (Wan 2.2) | ✓ (Wan 2.1) | |||
| Frame Rate | 24 fps | 24 fps | 24 fps | 24 fps |
Why Wan 2.6
Buble provides the most complete interface for Wan 2.6's advanced capabilities.
No API keys, no setup. Start generating with Wan 2.6 immediately through Buble's visual interface.
Upload reference videos directly in the browser. Preview character extraction and configure new scenes visually.
Write a single prompt and let Multi-Shot auto-plan your story. Review and adjust individual shots before generation.
Combine images, videos, and audio references in a single generation. The upload interface adapts to each input type automatically.
Switch between 16:9, 9:16, 1:1, 4:3, and 3:4 with one click. Preview the output frame before generating.
All generated videos are stored in your gallery. Download in MP4, MOV, or WebM. Re-generate or extend any previous result.
FAQ
Everything you need to know about Alibaba's latest AI video generation model.
Create
Reference-to-Video, Multi-Shot storytelling, and native audio sync — all available now. Clone characters, auto-plan narratives, and generate up to 15 seconds of cinematic video.
Also available: Seedance 2.0 · Veo 3.1