StableVideo - Text-driven Consistent Perception Diffusion Video Editing

Introduction#

This article briefly introduces StableVideo.

StableVideo is a text-driven consistency-aware diffusion video editing tool.

31-stablevideo-overview

Main Content#

1. What is StableVideo#

It can modify objects in a video while maintaining their appearance consistency over time.

2. StableVideo Features#

Introduce temporal dependency into the existing text-driven diffusion model to solve this problem, allowing the model to generate temporally consistent appearance for edited objects.

3. Using StableVideo#

Download and install project dependencies

git clone https://github.com/rese1f/StableVideo.git
conda create -n stablevideo python=3.11
pip install -r requirements.txt
(optional) pip install xformers

The project also provides a CPU-only version.

Download the pre-trained model
Download the sample data
Start testing: python app.py

4. Conclusion#

Modifying videos directly using SD results in some differences in each frame and inconsistency in time. This project mitigates the side effects of temporal inconsistency by adding a temporal dependency module, making it worth a try.

Finally#

References:

Official Project

Disclaimer#

This article is for personal learning purposes only.
This article is synchronized with HBlog.