Building and Testing MiniGPT-4 Locally

Introduction#

This article briefly documents the process of setting up MiniGPT-4 locally and the related model parameter download and conversion, followed by some testing.

Main Content#

1. What is MiniGPT-4#

MiniGPT-4 is an advanced large-scale language model that combines images and text to generate natural language descriptions of image content. The training process of this model consists of two stages: the first stage involves traditional pre-training using a large number of image-text pairs, and the second stage involves fine-tuning using high-quality image-text pairs created by the model itself, which significantly improves its generation reliability and overall practicality. MiniGPT-4 adopts the BLIP-2 model architecture and uses the Vicuna language model to align images and language models.

2. Local Environment Configuration#

1. Installing Conda#

MiniGPT-4 runs on the Python environment and can be used on Windows, Linux, and MacOS. The Python environment is configured using Conda. Therefore, the Conda environment needs to be set up first. The installation steps for the three different platforms are described below:

Installation Steps for Windows:
1. Click here to download Anaconda or click here to download Miniconda, depending on your operating system.
2. Double-click the downloaded installer and follow the instructions to install. During the installation process, you can choose to add Anaconda or Miniconda to the system PATH, which will allow you to use the Conda command conveniently in the command prompt.
3. After the installation is complete, open Anaconda Prompt or Windows PowerShell and enter "conda" to confirm if the installation was successful. If successful, the basic usage of Conda will be displayed.
4. If you need to create and manage Conda environments, you can use the "conda create" command to create a new environment and the "conda activate" command to activate the environment. Detailed environment management methods can be found in the Conda official documentation.
For Linux, MacOS, or other supported operating systems, you can download and install the program using the same method mentioned above. Terminal installation is recommended:
1. Download the Anaconda or Miniconda installer that matches your Linux version and architecture.
  
  Use the curl command to download:
```
curl -O https://repo.anaconda.com/archive/Anaconda-latest-Linux-x86_64.sh
```
  Use the wget command to download:
```
wget https://repo.anaconda.com/archive/Anaconda-latest-Linux-x86_64.sh
```
2. After the download is complete, you can install Anaconda by running the following command:
```
bash Anaconda-latest-Linux-x86_64.sh
```
  Follow the instructions of the installation program to complete the installation.

2. Configuring Mirror Sources (Optional for Mainland China)#

To change the mirror source in Conda, follow these steps:

Open the terminal or command prompt. Enter the following commands to set the channels in the Conda configuration file to Tsinghua mirror source:
```
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
conda config --set show_channel_urls yes
```
Note: If you want to use other mirror sources, please replace the URL in the above commands with the URL of the selected mirror source.
Enter the following command to update Conda:
```
conda update conda
```
Enter the following command to verify if the configuration was successful:
```
conda info
```
If you see the following information, it means that the mirror source has been successfully changed:
```
channels:
  https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
  https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
```
Now, you can use the Conda command to install packages, and they will be downloaded from the selected mirror source.

3. Installing MiniGPT-4#

After Conda is installed, you can proceed to install MiniGPT-4:

git clone https://github.com/Vision-CAIR/MiniGPT-4.git
cd MiniGPT-4
conda env create -f environment.yml
conda activate minigpt4

For users in mainland China, please note that the environment.yml file not only creates the Conda environment, but also installs the pip dependencies in the environment. However, since the pip in the environment has not been configured with a mirror source, it is basically impossible to install the required libraries.

To solve this problem, you need to comment out the pip dependencies in the environment.yml file, copy them to a new file named requirements.txt, and install them using the following command:

pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple

Three, Model Parameter Download and Conversion#

MiniGPT-4 is trained based on Vicuna-13B and Vicuna-7B in the V0 version. Therefore, there are two parts to the model parameters: Vicuna Weights and pretrained minigpt4 weights, which need to be downloaded and used separately.

1. Vicuna weights#

Vicuna is an open-source LLM based on LLAMA, which performs similarly to ChatGPT. MiniGPT-4 uses the V0 version of Vicuna-13B and has recently completed a model based on Vicuna-7B, which is suitable for GPU users with limited VRAM.

Note that all LLaMA-based models can only provide delta weights (download link). These delta weights need to be added to the original LLaMA weights using FastChat to obtain the final release model weights. This conversion process requires at least 60GB of memory usage, so the steps are not described here. If you are interested in the conversion process, you can refer to this article and this article.

Here are the direct download links for the converted Vicuna-13B model here and the Vicuna-7B model here. After downloading the complete models, modify the model loading address in the file minigpt4.yaml#L16 under the directory minigpt4/configs/models, and you can use them directly.

2. Pretrained minigpt4 weights#

These are the fine-tuned minigpt4 weights, which can be downloaded directly:

Checkpoint Aligned with Vicuna 13B	Checkpoint Aligned with Vicuna 7B
Download	Download

After downloading, modify the file minigpt4_eval.yaml#L10 in the eval_configs directory to the address of the downloaded parameter file.

3. Local Execution#

Execute the following command in the MiniGPT-4 directory:

python demo.py --cfg-path eval_configs/minigpt4_eval.yaml  --gpu-id 0

For Vicuna-13B 16-bit weights, at least 23GB of VRAM is required, and for Vicuna-7B, at least 12GB of VRAM is required. If you set the low_resource in the file minigpt4_eval.yaml under the eval_configs directory to True, you can use 8-bit weights, which further reduces VRAM usage. Modify it according to your actual GPU situation.

Four, Actual Testing#

The testing results using Vicuna-13B 16-bit are as follows:

This is the result of the test conducted on April 20, 2023. The MiniGPT-4 project is constantly being updated, so the performance should continue to improve.

Conclusion#

Compared to GPT4, MiniGPT-4 has a smaller model size and lower computational resource requirements, making it suitable for resource-constrained scenarios. At the same time, MiniGPT-4 has shown excellent performance in visual language understanding tasks, especially in generation tasks. However, MiniGPT-4 still has some limitations compared to GPT4, such as the generated results may not be as fluent as GPT4 and MiniGPT-4 does not cover all the functionalities of GPT4. Nevertheless, MiniGPT-4 is a promising model with great potential for development. This is also a small step towards the miniaturization of LLM, just like the early days of the Internet, the future is promising.

Finally#

References#

How to Get Vicuna-13B Model Weights

Pitfalls in Running the Little Llama Model (FastChat-vicuna)

MiniGPT-4 Official Documentation

Disclaimer#

This article is for personal learning purposes only.

This article is synchronized with HBlog.