

热爱技术的小菜鸟, 记录一下所学所感

MLC-LLM is committed to enabling everyone to run large language models on mobile devices.

Tool Sharing#

MLC-LLM allows everyone to develop, optimize, and deploy AI models on their own devices.



It supports inference across various devices, including server-level hardware, as well as users' browsers, laptops, and mobile applications. It provides a reproducible, systematic, and customizable workflow that enables developers to implement models and optimizations in a productivity-centered, Python-first approach.

Users can build applications based on different model parameters, which are sourced from Huggingface's open-source models, and quantization is automatically performed during the construction process.

Alternatively, users can directly use the pre-compiled applications provided by the official project, running the entire project on local devices.


MLC Chat app provided by the official project can be downloaded from:




The supported platforms include:

  • iPhone, iPad;
  • Android phones;
  • Apple Silicon and x86 MacBooks;
  • AMD, Intel, and NVIDIA GPUs via Vulkan on Windows and Linux;
  • NVIDIA GPUs via CUDA on Windows and Linux;
  • WebGPU on browsers (through the companion project WebLLM).

Downloading model parameters may consume a significant amount of data.

Memory limitations may be exceeded when using it on mobile devices.


This article is solely for the purpose of sharing tools.

This article is related to HBlog.

Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.