Tool Sharing#
MLC-LLM allows everyone to develop, optimize, and deploy AI models on their own devices.
Features#
It supports inference across various devices, including server-level hardware, as well as users' browsers, laptops, and mobile applications. It provides a reproducible, systematic, and customizable workflow that enables developers to implement models and optimizations in a productivity-centered, Python-first approach.
Users can build applications based on different model parameters, which are sourced from Huggingface's open-source models, and quantization is automatically performed during the construction process.
Alternatively, users can directly use the pre-compiled applications provided by the official project, running the entire project on local devices.
Platforms#
MLC Chat app provided by the official project can be downloaded from:
The supported platforms include:
- iPhone, iPad;
- Android phones;
- Apple Silicon and x86 MacBooks;
- AMD, Intel, and NVIDIA GPUs via Vulkan on Windows and Linux;
- NVIDIA GPUs via CUDA on Windows and Linux;
- WebGPU on browsers (through the companion project WebLLM).
Downloading model parameters may consume a significant amount of data.
Memory limitations may be exceeded when using it on mobile devices.
Disclaimer#
This article is solely for the purpose of sharing tools.
This article is related to HBlog.