Update docs: add deployment guide, Hugging Face token instructions, getting‑started tutorial, and include a Bazel lock example.
This commit is contained in:
parent
f9280b1069
commit
7fb02e1888
@ -20,12 +20,12 @@ following arguments to the command line when compiling / running a model:
|
|||||||
- AWS Trainium/Inferentia 2: `--@zml//runtimes:neuron=true`
|
- AWS Trainium/Inferentia 2: `--@zml//runtimes:neuron=true`
|
||||||
- **AVOID CPU:** `--@zml//runtimes:cpu=false`
|
- **AVOID CPU:** `--@zml//runtimes:cpu=false`
|
||||||
|
|
||||||
So, to run the OpenLLama model from above **on your development machine**
|
So, to run the Llama model from above **on your development machine**
|
||||||
housing an NVIDIA GPU, run the following:
|
housing an NVIDIA GPU, run the following:
|
||||||
|
|
||||||
```
|
```
|
||||||
cd examples
|
cd examples
|
||||||
bazel run --config=release //llama:OpenLLaMA-3B --@zml//runtimes:cuda=true
|
bazel run --config=release //llama --@zml//runtimes:cuda=true -- --hf-model-path=$HOME/Llama-3.2-1B-Instruct
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
@ -38,14 +38,14 @@ architectures:
|
|||||||
- Linux ARM64: `--platforms=@zml//platforms:linux_arm64`
|
- Linux ARM64: `--platforms=@zml//platforms:linux_arm64`
|
||||||
- MacOS ARM64: `--platforms=@zml//platforms:macos_arm64`
|
- MacOS ARM64: `--platforms=@zml//platforms:macos_arm64`
|
||||||
|
|
||||||
As an example, here is how you build above OpenLLama for CUDA on Linux X86_64:
|
As an example, here is how you build above Llama for CUDA on Linux X86_64:
|
||||||
|
|
||||||
```
|
```
|
||||||
cd examples
|
cd examples
|
||||||
bazel build --config=release //llama:OpenLLaMA-3B \
|
bazel build --config=release //llama \
|
||||||
--@zml//runtimes:cuda=true \
|
--@zml//runtimes:cuda=true \
|
||||||
--@zml//runtimes:cpu=false \
|
--@zml//runtimes:cpu=false \
|
||||||
--platforms=@zml//platforms:linux_amd64
|
--platforms=@zml//platforms:linux_amd64
|
||||||
```
|
```
|
||||||
|
|
||||||
### Creating the TAR
|
### Creating the TAR
|
||||||
|
|||||||
@ -1,24 +1,21 @@
|
|||||||
|
# Running Gated Huggingface Models with Token Authentication
|
||||||
|
|
||||||
# Huggingface Token Authentication
|
Some models have restrictions and may require some sort of approval or agreement
|
||||||
|
process, which, by consequence, **requires token-authentication with Huggingface**.
|
||||||
|
|
||||||
Some models have restrictions and may require some sort of approval or
|
The easiest way might be to use the `huggingface-cli login` command.
|
||||||
agreement process, which, by consequence, **requires token-authentication with
|
|
||||||
Huggingface**.
|
|
||||||
|
|
||||||
Here is how you can generate a **"read-only public repositories"** access token
|
Alternatively, here is how you can generate a **"read-only public repositories"**
|
||||||
to log into your account on Huggingface, directly from `bazel`, in order to
|
access token to log into your account on Huggingface, directly from `bazel`, in order to download models.
|
||||||
download models.
|
|
||||||
|
|
||||||
* log in at [https://huggingface.co/settings/tokens](https://huggingface.co/settings/tokens).
|
* log in at [https://huggingface.co/settings/tokens](https://huggingface.co/settings/tokens).
|
||||||
* click on "Create new token"
|
* click on "Create new token"
|
||||||
* give the token a name, eg `zml_public_repos`
|
* give the token a name, eg `zml_public_repos`,
|
||||||
* under _Repositories_, grant the following permission: "Read access to
|
* under _Repositories_, grant the following permission: "Read access to contents of all public gated repos you can access".
|
||||||
contents of all public gated repos you can access".
|
* at the bottom click on "Create token".
|
||||||
* at the bottom, click on "Create token".
|
|
||||||
* copy the token by clicking `Copy`. **You won't be able to see it again.**
|
* copy the token by clicking `Copy`. **You won't be able to see it again.**
|
||||||
* the token looks something like `hf_abCdEfGhijKlM`.
|
* the token looks something like `hf_abCdEfGhijKlM`.
|
||||||
* store the token on your machine (replace the placeholder with your actual
|
* store the token on your machine (replace the placeholder with your actual token):
|
||||||
token):
|
|
||||||
|
|
||||||
You can use the `HUGGINGFACE_TOKEN` environment variable to store the token or use
|
You can use the `HUGGINGFACE_TOKEN` environment variable to store the token or use
|
||||||
its standard location:
|
its standard location:
|
||||||
@ -26,7 +23,7 @@ its standard location:
|
|||||||
mkdir -p $HOME/.cache/huggingface/; echo <hf_my_token> > "$HOME/.cache/huggingface/token"
|
mkdir -p $HOME/.cache/huggingface/; echo <hf_my_token> > "$HOME/.cache/huggingface/token"
|
||||||
```
|
```
|
||||||
|
|
||||||
Now you're ready to download a gated model like `Meta-Llama-3-8b`!
|
Now you're ready to download a gated model like `Meta-Llama-3.2-1b`!
|
||||||
|
|
||||||
**Example:**
|
**Example:**
|
||||||
|
|
||||||
@ -34,8 +31,7 @@ Now you're ready to download a gated model like `Meta-Llama-3-8b`!
|
|||||||
# requires token in $HOME/.cache/huggingface/token, as created by the
|
# requires token in $HOME/.cache/huggingface/token, as created by the
|
||||||
# `huggingface-cli login` command, or the `HUGGINGFACE_TOKEN` environment variable.
|
# `huggingface-cli login` command, or the `HUGGINGFACE_TOKEN` environment variable.
|
||||||
cd examples
|
cd examples
|
||||||
bazel run --config=release //llama:Meta-Llama-3-8b
|
bazel run @zml//tools:hf -- download meta-llama/Llama-3.2-1B-Instruct --local-dir $HOME/Llama-3.2-1B-Instruct --exclude='*.pth'
|
||||||
bazel run --config=release //llama:Meta-Llama-3-8b -- --promt="Once upon a time,"
|
bazel run --config=release //llama -- --hf-model-path=$HOME/Llama-3.2-1B-Instruct --prompt="What is the capital of France?"
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@ -75,8 +75,9 @@ Once you've been granted access, you're ready to download a gated model like
|
|||||||
# requires token in $HOME/.cache/huggingface/token, as created by the
|
# requires token in $HOME/.cache/huggingface/token, as created by the
|
||||||
# `huggingface-cli login` command, or the `HUGGINGFACE_TOKEN` environment variable.
|
# `huggingface-cli login` command, or the `HUGGINGFACE_TOKEN` environment variable.
|
||||||
cd examples
|
cd examples
|
||||||
bazel run --config=release //llama:Llama-3.1-8B-Instruct
|
bazel run @zml//tools:hf -- download meta-llama/Llama-3.1-8B-Instruct --local-dir $HOME/Llama-3.1-8B-Instruct --exclude='*.pth'
|
||||||
bazel run --config=release //llama:Llama-3.1-8B-Instruct -- --prompt="What is the capital of France?"
|
bazel run --config=release //llama -- --hf-model-path=$HOME/Llama-3.1-8B-Instruct
|
||||||
|
bazel run --config=release //llama -- --hf-model-path=$HOME/Llama-3.1-8B-Instruct --prompt="What is the capital of France?"
|
||||||
```
|
```
|
||||||
|
|
||||||
You can also try `Llama-3.1-70B-Instruct` if you have enough memory.
|
You can also try `Llama-3.1-70B-Instruct` if you have enough memory.
|
||||||
@ -88,8 +89,9 @@ Like the 8B model above, this model also requires approval. See
|
|||||||
|
|
||||||
```
|
```
|
||||||
cd examples
|
cd examples
|
||||||
bazel run --config=release //llama:Llama-3.2-1B-Instruct
|
bazel run @zml//tools:hf -- download meta-llama/Llama-3.2-1B-Instruct --local-dir $HOME/Llama-3.2-1B-Instruct --exclude='*.pth'
|
||||||
bazel run --config=release //llama:Llama-3.2-1B-Instruct -- --prompt="What is the capital of France?"
|
bazel run --config=release //llama -- --hf-model-path=$HOME/Llama-3.2-1B-Instruct
|
||||||
|
bazel run --config=release //llama -- --hf-model-path=$HOME/Llama-3.2-1B-Instruct --prompt="What is the capital of France?"
|
||||||
```
|
```
|
||||||
|
|
||||||
For a larger 3.2 model, you can also try `Llama-3.2-3B-Instruct`.
|
For a larger 3.2 model, you can also try `Llama-3.2-3B-Instruct`.
|
||||||
|
|||||||
File diff suppressed because one or more lines are too long
Loading…
Reference in New Issue
Block a user