【记录】Win10平台使用MLC-LLM编译Qwen-1.8B-Chat

Last updated on March 19, 2024 pm

MLC-LLM 是一种大模型高性能通用部署解决方案,可以通过预编译加速使用本机API原生部署任何大型语言模型。该项目的使命是利用ML编译技术,使每个人都能在其设备上本地开发、优化和部署AI模型。
Qwen-1.8B 是阿里云研发的通义千问大模型系列的18亿参数规模的模型。在Qwen-1.8B的基础上,使用对齐机制打造了基于大语言模型的AI助手 Qwen-1.8B-Chat

配置环境

1
2
3
4
5
6
7
8
9
10
11
12
conda create -n mlc_llm python numpy pytorch transformers scipy timm git -c pytorch -c conda-forge
conda activate mlc_llm
python -m pip install --pre -U -f https://mlc.ai/wheels mlc-ai-nightly
python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))"
python -c "import tvm; print(tvm.vulkan().exist)"
cd D:\mlc-llm
git clone --depth=1 -b main --single-branch https://github.com/mlc-ai/mlc-llm.git
cd .\mlc-llm\
git submodule sync
git submodule update --init --recursive --depth=1
pip install .
python -m mlc_llm.build --help

准备模型

1
2
3
4
python -c "from huggingface_hub import snapshot_download; snapshot_download(repo_id='Qwen/Qwen-1_8B-Chat', local_dir='D:\mlc-llm\qwen', ignore_patterns=['*.h5', '*.ot', '*.msgpack', '*.safetensors'])"
cd D:\mlc-llm\qwen
D:\aria2\aria2c.exe --all-proxy='http://127.0.0.1:7890' -o 'model-00001-of-00002.safetensors' "https://huggingface.co/Qwen/Qwen-1_8B-Chat/resolve/main/model-00001-of-00002.safetensors?download=true"
D:\aria2\aria2c.exe --all-proxy='http://127.0.0.1:7890' -o 'model-00002-of-00002.safetensors' "https://huggingface.co/Qwen/Qwen-1_8B-Chat/resolve/main/model-00002-of-00002.safetensors?download=true"

编译模型

1
2
cd D:\mlc-llm\dist
python -m mlc_llm.build --model "D:\mlc-llm\qwen" --target vulkan --quantization q0f16 --use-safetensors

【记录】Win10平台使用MLC-LLM编译Qwen-1.8B-Chat
https://hexo.limour.top/Compile-Qwen-1.8B-Chat-using-MLC-LLM-on-Win
Author
Limour
Posted on
December 9, 2023
Updated on
March 19, 2024
Licensed under