2025-12-15
Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model
ABSTRACT
Recent strides in video generation have paved the way for unified audio-visual generation. In this work, we present Seedance 1.5 pro, a foundational model engineered specifically for native, joint audio-video generation. Leveraging a dual-branch Diffusion Transformer architecture, the model integrates a cross-modal joint module with a specialized multi-stage data pipeline, achieving exceptional audio-visual synchronization and superior generation quality. To ensure practical utility, we implement meticulous post-training optimizations, including Supervised Fine-Tuning (SFT) on high-quality datasets and Reinforcement Learning from Human Feedback (RLHF) with multi-dimensional reward models. Furthermore, we introduce an acceleration framework that boosts inference speed by over 10×. Seedance 1.5 pro distinguishes itself through precise multilingual and dialect lip-syncing, dynamic cinematic camera control, and enhanced narrative coherence, positioning it as a robust engine for professional-grade content creation. Seedance 1.5 pro is now accessible on Volcano Engine.
AUTHORS
Seed Vision Team
Featured Publications
View AllSeed LiveInterpret 2.0: End-to-end Simultaneous Speech-to-speech Translation with Your Voice
Seed Speech Team
2025-07-24
GR-3 Technical Report
Seed Robotics Team
2025-07-21
Seedance 1.0: Exploring the Boundaries of Video Generation Models
Seed Vision Team
2025-06-11