Update docs: add deployment guide, Hugging Face token instructions, getting‑started tutorial, and include a Bazel lock example.
This commit is contained in:
parent
f9280b1069
commit
7fb02e1888
@ -20,12 +20,12 @@ following arguments to the command line when compiling / running a model:
|
||||
- AWS Trainium/Inferentia 2: `--@zml//runtimes:neuron=true`
|
||||
- **AVOID CPU:** `--@zml//runtimes:cpu=false`
|
||||
|
||||
So, to run the OpenLLama model from above **on your development machine**
|
||||
So, to run the Llama model from above **on your development machine**
|
||||
housing an NVIDIA GPU, run the following:
|
||||
|
||||
```
|
||||
cd examples
|
||||
bazel run --config=release //llama:OpenLLaMA-3B --@zml//runtimes:cuda=true
|
||||
bazel run --config=release //llama --@zml//runtimes:cuda=true -- --hf-model-path=$HOME/Llama-3.2-1B-Instruct
|
||||
```
|
||||
|
||||
|
||||
@ -38,11 +38,11 @@ architectures:
|
||||
- Linux ARM64: `--platforms=@zml//platforms:linux_arm64`
|
||||
- MacOS ARM64: `--platforms=@zml//platforms:macos_arm64`
|
||||
|
||||
As an example, here is how you build above OpenLLama for CUDA on Linux X86_64:
|
||||
As an example, here is how you build above Llama for CUDA on Linux X86_64:
|
||||
|
||||
```
|
||||
cd examples
|
||||
bazel build --config=release //llama:OpenLLaMA-3B \
|
||||
bazel build --config=release //llama \
|
||||
--@zml//runtimes:cuda=true \
|
||||
--@zml//runtimes:cpu=false \
|
||||
--platforms=@zml//platforms:linux_amd64
|
||||
|
||||
@ -1,24 +1,21 @@
|
||||
# Running Gated Huggingface Models with Token Authentication
|
||||
|
||||
# Huggingface Token Authentication
|
||||
Some models have restrictions and may require some sort of approval or agreement
|
||||
process, which, by consequence, **requires token-authentication with Huggingface**.
|
||||
|
||||
Some models have restrictions and may require some sort of approval or
|
||||
agreement process, which, by consequence, **requires token-authentication with
|
||||
Huggingface**.
|
||||
The easiest way might be to use the `huggingface-cli login` command.
|
||||
|
||||
Here is how you can generate a **"read-only public repositories"** access token
|
||||
to log into your account on Huggingface, directly from `bazel`, in order to
|
||||
download models.
|
||||
Alternatively, here is how you can generate a **"read-only public repositories"**
|
||||
access token to log into your account on Huggingface, directly from `bazel`, in order to download models.
|
||||
|
||||
* log in at [https://huggingface.co/settings/tokens](https://huggingface.co/settings/tokens).
|
||||
* click on "Create new token"
|
||||
* give the token a name, eg `zml_public_repos`
|
||||
* under _Repositories_, grant the following permission: "Read access to
|
||||
contents of all public gated repos you can access".
|
||||
* at the bottom, click on "Create token".
|
||||
* give the token a name, eg `zml_public_repos`,
|
||||
* under _Repositories_, grant the following permission: "Read access to contents of all public gated repos you can access".
|
||||
* at the bottom click on "Create token".
|
||||
* copy the token by clicking `Copy`. **You won't be able to see it again.**
|
||||
* the token looks something like `hf_abCdEfGhijKlM`.
|
||||
* store the token on your machine (replace the placeholder with your actual
|
||||
token):
|
||||
* store the token on your machine (replace the placeholder with your actual token):
|
||||
|
||||
You can use the `HUGGINGFACE_TOKEN` environment variable to store the token or use
|
||||
its standard location:
|
||||
@ -26,7 +23,7 @@ its standard location:
|
||||
mkdir -p $HOME/.cache/huggingface/; echo <hf_my_token> > "$HOME/.cache/huggingface/token"
|
||||
```
|
||||
|
||||
Now you're ready to download a gated model like `Meta-Llama-3-8b`!
|
||||
Now you're ready to download a gated model like `Meta-Llama-3.2-1b`!
|
||||
|
||||
**Example:**
|
||||
|
||||
@ -34,8 +31,7 @@ Now you're ready to download a gated model like `Meta-Llama-3-8b`!
|
||||
# requires token in $HOME/.cache/huggingface/token, as created by the
|
||||
# `huggingface-cli login` command, or the `HUGGINGFACE_TOKEN` environment variable.
|
||||
cd examples
|
||||
bazel run --config=release //llama:Meta-Llama-3-8b
|
||||
bazel run --config=release //llama:Meta-Llama-3-8b -- --promt="Once upon a time,"
|
||||
bazel run @zml//tools:hf -- download meta-llama/Llama-3.2-1B-Instruct --local-dir $HOME/Llama-3.2-1B-Instruct --exclude='*.pth'
|
||||
bazel run --config=release //llama -- --hf-model-path=$HOME/Llama-3.2-1B-Instruct --prompt="What is the capital of France?"
|
||||
```
|
||||
|
||||
|
||||
|
||||
@ -75,8 +75,9 @@ Once you've been granted access, you're ready to download a gated model like
|
||||
# requires token in $HOME/.cache/huggingface/token, as created by the
|
||||
# `huggingface-cli login` command, or the `HUGGINGFACE_TOKEN` environment variable.
|
||||
cd examples
|
||||
bazel run --config=release //llama:Llama-3.1-8B-Instruct
|
||||
bazel run --config=release //llama:Llama-3.1-8B-Instruct -- --prompt="What is the capital of France?"
|
||||
bazel run @zml//tools:hf -- download meta-llama/Llama-3.1-8B-Instruct --local-dir $HOME/Llama-3.1-8B-Instruct --exclude='*.pth'
|
||||
bazel run --config=release //llama -- --hf-model-path=$HOME/Llama-3.1-8B-Instruct
|
||||
bazel run --config=release //llama -- --hf-model-path=$HOME/Llama-3.1-8B-Instruct --prompt="What is the capital of France?"
|
||||
```
|
||||
|
||||
You can also try `Llama-3.1-70B-Instruct` if you have enough memory.
|
||||
@ -88,8 +89,9 @@ Like the 8B model above, this model also requires approval. See
|
||||
|
||||
```
|
||||
cd examples
|
||||
bazel run --config=release //llama:Llama-3.2-1B-Instruct
|
||||
bazel run --config=release //llama:Llama-3.2-1B-Instruct -- --prompt="What is the capital of France?"
|
||||
bazel run @zml//tools:hf -- download meta-llama/Llama-3.2-1B-Instruct --local-dir $HOME/Llama-3.2-1B-Instruct --exclude='*.pth'
|
||||
bazel run --config=release //llama -- --hf-model-path=$HOME/Llama-3.2-1B-Instruct
|
||||
bazel run --config=release //llama -- --hf-model-path=$HOME/Llama-3.2-1B-Instruct --prompt="What is the capital of France?"
|
||||
```
|
||||
|
||||
For a larger 3.2 model, you can also try `Llama-3.2-3B-Instruct`.
|
||||
|
||||
File diff suppressed because one or more lines are too long
Loading…
Reference in New Issue
Block a user