diff --git a/content/post/20230506_vicuna/index.md b/content/post/20230506_vicuna/index.md new file mode 100644 index 0000000..7e425f7 --- /dev/null +++ b/content/post/20230506_vicuna/index.md @@ -0,0 +1,141 @@ ++++ +title = "Running Vicuna-13B in the Cloud" +date = 2023-05-06T00:00:00 +lastmod = 2023-05-06T00:00:00 +draft = false + +# Authors. Comma separated list, e.g. `["Bob Smith", "David Jones"]`. +authors = ["Carl Pearson"] + +tags = [] + +summary = "" + +# Projects (optional). +# Associate this post with one or more of your projects. +# Simply enter your project's folder or file name without extension. +# E.g. `projects = ["deep-learning"]` references +# `content/project/deep-learning/index.md`. +# Otherwise, set `projects = []`. +projects = [] + +# Featured image +# To use, add an image named `featured.jpg/png` to your project's folder. +[image] + # Caption (optional) + caption = "" + + # Focal point (optional) + # Options: Smart, Center, TopLeft, Top, TopRight, Left, Right, BottomLeft, Bottom, BottomRight + focal_point = "Center" + + # Show image only in page previews? + preview_only = false + + +categories = [] + +# Set captions for image gallery. + ++++ + +Vicuna-13B is an LLM based off of the LLaMa model. + +## Set up your Cloud Instance + +Create a cloud VM with +* 150 GB of disk space +* 64 GB of CPU memory + +When everything was done, my VM had 132GB of disk space used, so you'll probably want at least 150GB + +Ordinarily I wouldn't recommend setting up python like this, but since we're just experimenting: + +``` +apt-get install python3-pip +``` + +## Acquire the LLaMa-13B model + +For licensing reasons, Vicuna-13B is distributed as a delta of the LLaMa model, so the first step is to acquire the LLaMa model. +The official way is to request the weights from Meta, by filling this [Google Docs form](https://docs.google.com/forms/d/e/1FAIpQLSfqNECQnMkycAp2jP4Z9TFX0cGR4uf7b_fBxjY_OjhJILlKGA/viewform?usp=send_form). + +You can also use leaked weights from a torrent with the following magnet link: + +`magnet:?xt=urn:btih:b8287ebfa04fcf3e8014352&dn=LLaMA` + +> NOTE +> replace `` above with this: `879b048d4d4404108` + +Or, someone has made the leaked weights available on IPFS, which you can access through a helpful mirror: + +https://ipfs.io/ipfs/Qmb9y5GCkTG7ZzbBWMu2BXwMkzyCKcUjtEKPpgdZ7GEFKm/ + +I couldn't figure out how to get a torrent client working on Google's VMs (perhaps a firewall issue), so I ended up using [aria2c]() to download the LLaMa weights from the IPFS mirror above. + +``` +apt-get install aria2 + +mkdir -p $HOME/llama/13B +cd $HOME/llama/13B +aria2c https://ipfs.io/ipfs/QmPCfCEERStStjg4kfj3cmCUu1TP7pVQbxdFMwnhpuJtxk/consolidated.00.pth +aria2c https://ipfs.io/ipfs/QmPCfCEERStStjg4kfj3cmCUu1TP7pVQbxdFMwnhpuJtxk/consolidated.01.pth +aria2c https://ipfs.io/ipfs/QmPCfCEERStStjg4kfj3cmCUu1TP7pVQbxdFMwnhpuJtxk/checklist.chk +aria2c https://ipfs.io/ipfs/QmPCfCEERStStjg4kfj3cmCUu1TP7pVQbxdFMwnhpuJtxk/params.json +aria2c https://ipfs.io/ipfs/Qmb9y5GCkTG7ZzbBWMu2BXwMkzyCKcUjtEKPpgdZ7GEFKm/tokenizer.model +``` + +The `consolidated` files are the weights. +`checklist.chk` has the md5 sums for the files, which you should check after they're downloaded. +`params.json` seems to have some metadata. +Finally, `tokenizer.model` is needed to convert the weights to HuggingFace format. + +## Convert weights to HuggingFace Format + +``` +pip install torch transformers accelerate sentencepiece protobuf==3.20 +python3 convert_llama_weights_to_hf.py --input_dir ~ --output_dir ~/llama-hf --model_size 13B +``` + +I used [rev d2ffc3fc4 of the script](https://github.com/huggingface/transformers/blob/d2ffc3fc48430f629c38c36fa8f308b045d1f715/src/transformers/models/llama/convert_llama_weights_to_hf.py). + +``` +apt-get install wget + +wget https://github.com/huggingface/transformers/blob/d2ffc3fc48430f629c38c36fa8f308b045d1f715/src/transformers/models/llama/convert_llama_weights_to_hf.py + +pip install torch transformers accelerate sentencepiece protobuf==3.20 + +python3 convert_llama_weights_to_hf.py --input_dir $HOME/llama --output_dir $HOME/llama-hf --model_size 13B +``` + +These are the package versions that worked for me (note `protobuf=3.20` in the pip install command). + +| package | version | +|----------------|---------| +|`torch` | 2.0.0 | +|`transformers` | 4.28.1 | +|`accelerate` | 0.18.0 | +|`sentencepiece` | 0.1.99 | +|`protobuf` | 3.20.0 | + +I got an error about regenerating protobuf functions if I used protobuf > 3.20. + +## Apply the vicuna deltas + +FastChat has done the work of getting a little chat interface set up. +We'll use their package to download the deltas and apply them as well. + +``` +pip install fschat +python3 -m fastchat.model.apply_delta \ + --base-model-path $HOME/llama_hf \ + --target-model-path $HOME/vicuna-13b \ + --delta-path lmsys/vicuna-13b-delta-v1.1 +``` + +I had `fschat 0.2.5`. + +## Start Chatting + +python3 -m fastchat.serve.cli --device cpu --model-path vicuna-13b/ \ No newline at end of file