banner
hughie

hughie

热爱技术的小菜鸟, 记录一下所学所感

VITS-fast-fine-tuning - Quickly Clone Custom Role Voices

Introduction#

This article provides a brief introduction to VITS-fast-fine-tuning.

VITS-fast-fine-tuning is a fine-tuning training library for VITS that allows for the rapid cloning of desired character voices.


Content#

1. What is VITS-fast-fine-tuning?#

It is a fast way to clone the voices of characters in audio.

2. Features of VITS-fast-fine-tuning#

  • Voice conversion between any two characters in the model;
  • Text-to-speech (TTS) for custom character voices in Chinese, Japanese, and English.
  • Supports various fine-tuning methods:
    • Cloning character voices from more than 10 short audio clips
    • Cloning character voices from audio clips longer than 3 minutes (individual audio clips can only contain a single speaker)
    • Cloning character voices from videos longer than 3 minutes (individual videos can only contain a single speaker)
    • Cloning character voices by inputting a Bilibili video link

3. Usage and Training of VITS-fast-fine-tuning#

Fine-tuning custom character voices#

  • Prepare the data
  • Train online using Google Colab
  • Alternatively, train locally by following the tutorial. This method requires CUDA dependencies, downloading project code and pre-trained models, and is more complicated. Training with Colab is simpler.

Usage and Inference#

  1. Download the fine-tuned model and config files.
  2. Download the latest release package (on the right side of the GitHub page).
  3. Place the downloaded model and config files in the inference folder, with the filenames G_latest.pth and finetune_speaker.json, respectively.
  4. Once everything is ready, the file structure should look as follows:
inference
├───inference.exe
├───...
├───finetune_speaker.json
└───G_latest.pth
  1. Run inference.exe. A browser window will automatically pop up. Note that the path to the file should not contain any Chinese characters or spaces.
  2. Please note that the voice conversion feature requires the installation of ffmpeg to function properly.

4. Conclusion#

This project simplifies the process of fine-tuning custom character voices and provides a packaged program for easy use with pre-trained models.


Finally#

References:

Official Project


Disclaimer#

This article is for personal learning purposes only.

This article is synchronized with HBlog.

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.