MLC-LLM is committed to enabling everyone to run large language models on mobile devices.

Jun 6, 2023#AI #MLC #LLM #GPT253

AI Translation

This post is translated from Chinese into English through AI.View Original

AI-generated summary

MLC-LLM is a tool that allows everyone to develop, optimize, and deploy AI models on their own devices. It supports various devices including servers, browsers, laptops, and mobile applications. It provides a customizable workflow for developers to implement and optimize models using a Python-first approach. Users can build applications with different model parameters, which are sourced from Huggingface's open-source models and automatically quantized during construction. The platform is available for download on Apple, Android, and WebLLM. However, downloading model parameters may consume a significant amount of data, and there may be memory limitations when using it on mobile devices. This article is only intended for sharing tools and is not affiliated with HBlog.

MLC-LLM allows everyone to develop, optimize, and deploy AI models on their own devices.

15-MLCLLM

Features#

It supports inference across various devices, including server-level hardware, as well as users' browsers, laptops, and mobile applications. It provides a reproducible, systematic, and customizable workflow that enables developers to implement models and optimizations in a productivity-centered, Python-first approach.

Users can build applications based on different model parameters, which are sourced from Huggingface's open-source models, and quantization is automatically performed during the construction process.

Alternatively, users can directly use the pre-compiled applications provided by the official project, running the entire project on local devices.

Platforms#

MLC Chat app provided by the official project can be downloaded from:

Apple

Android

WebLLM

The supported platforms include:

iPhone, iPad;
Android phones;
Apple Silicon and x86 MacBooks;
AMD, Intel, and NVIDIA GPUs via Vulkan on Windows and Linux;
NVIDIA GPUs via CUDA on Windows and Linux;
WebGPU on browsers (through the companion project WebLLM).

Downloading model parameters may consume a significant amount of data.

Memory limitations may be exceeded when using it on mobile devices.

Disclaimer#

This article is solely for the purpose of sharing tools.

This article is related to HBlog.