Introduction#
This article provides a brief introduction to VITS-fast-fine-tuning.
VITS-fast-fine-tuning is a fine-tuning training library for VITS that allows for the rapid cloning of desired character voices.
Content#
1. What is VITS-fast-fine-tuning?#
It is a fast way to clone the voices of characters in audio.
2. Features of VITS-fast-fine-tuning#
- Voice conversion between any two characters in the model;
- Text-to-speech (TTS) for custom character voices in Chinese, Japanese, and English.
- Supports various fine-tuning methods:
- Cloning character voices from more than 10 short audio clips
- Cloning character voices from audio clips longer than 3 minutes (individual audio clips can only contain a single speaker)
- Cloning character voices from videos longer than 3 minutes (individual videos can only contain a single speaker)
- Cloning character voices by inputting a Bilibili video link
3. Usage and Training of VITS-fast-fine-tuning#
Fine-tuning custom character voices#
- Prepare the data
- Train online using Google Colab
- Alternatively, train locally by following the tutorial. This method requires CUDA dependencies, downloading project code and pre-trained models, and is more complicated. Training with Colab is simpler.
Usage and Inference#
- Download the fine-tuned model and config files.
- Download the latest release package (on the right side of the GitHub page).
- Place the downloaded model and config files in the
inference
folder, with the filenamesG_latest.pth
andfinetune_speaker.json
, respectively. - Once everything is ready, the file structure should look as follows:
inference
├───inference.exe
├───...
├───finetune_speaker.json
└───G_latest.pth
- Run
inference.exe
. A browser window will automatically pop up. Note that the path to the file should not contain any Chinese characters or spaces. - Please note that the voice conversion feature requires the installation of
ffmpeg
to function properly.
4. Conclusion#
This project simplifies the process of fine-tuning custom character voices and provides a packaged program for easy use with pre-trained models.
Finally#
References:
Disclaimer#
This article is for personal learning purposes only.
This article is synchronized with HBlog.