Update docs: add deployment guide, Hugging Face token instructions, getting‑started tutorial, and include a Bazel lock example.

This commit is contained in:
Foke Singh 2025-06-05 13:18:14 +00:00
parent f9280b1069
commit 7fb02e1888
4 changed files with 379 additions and 30 deletions

View File

@ -20,12 +20,12 @@ following arguments to the command line when compiling / running a model:
- AWS Trainium/Inferentia 2: `--@zml//runtimes:neuron=true` - AWS Trainium/Inferentia 2: `--@zml//runtimes:neuron=true`
- **AVOID CPU:** `--@zml//runtimes:cpu=false` - **AVOID CPU:** `--@zml//runtimes:cpu=false`
So, to run the OpenLLama model from above **on your development machine** So, to run the Llama model from above **on your development machine**
housing an NVIDIA GPU, run the following: housing an NVIDIA GPU, run the following:
``` ```
cd examples cd examples
bazel run --config=release //llama:OpenLLaMA-3B --@zml//runtimes:cuda=true bazel run --config=release //llama --@zml//runtimes:cuda=true -- --hf-model-path=$HOME/Llama-3.2-1B-Instruct
``` ```
@ -38,14 +38,14 @@ architectures:
- Linux ARM64: `--platforms=@zml//platforms:linux_arm64` - Linux ARM64: `--platforms=@zml//platforms:linux_arm64`
- MacOS ARM64: `--platforms=@zml//platforms:macos_arm64` - MacOS ARM64: `--platforms=@zml//platforms:macos_arm64`
As an example, here is how you build above OpenLLama for CUDA on Linux X86_64: As an example, here is how you build above Llama for CUDA on Linux X86_64:
``` ```
cd examples cd examples
bazel build --config=release //llama:OpenLLaMA-3B \ bazel build --config=release //llama \
--@zml//runtimes:cuda=true \ --@zml//runtimes:cuda=true \
--@zml//runtimes:cpu=false \ --@zml//runtimes:cpu=false \
--platforms=@zml//platforms:linux_amd64 --platforms=@zml//platforms:linux_amd64
``` ```
### Creating the TAR ### Creating the TAR

View File

@ -1,24 +1,21 @@
# Running Gated Huggingface Models with Token Authentication
# Huggingface Token Authentication Some models have restrictions and may require some sort of approval or agreement
process, which, by consequence, **requires token-authentication with Huggingface**.
Some models have restrictions and may require some sort of approval or The easiest way might be to use the `huggingface-cli login` command.
agreement process, which, by consequence, **requires token-authentication with
Huggingface**.
Here is how you can generate a **"read-only public repositories"** access token Alternatively, here is how you can generate a **"read-only public repositories"**
to log into your account on Huggingface, directly from `bazel`, in order to access token to log into your account on Huggingface, directly from `bazel`, in order to download models.
download models.
* log in at [https://huggingface.co/settings/tokens](https://huggingface.co/settings/tokens). * log in at [https://huggingface.co/settings/tokens](https://huggingface.co/settings/tokens).
* click on "Create new token" * click on "Create new token"
* give the token a name, eg `zml_public_repos` * give the token a name, eg `zml_public_repos`,
* under _Repositories_, grant the following permission: "Read access to * under _Repositories_, grant the following permission: "Read access to contents of all public gated repos you can access".
contents of all public gated repos you can access". * at the bottom click on "Create token".
* at the bottom, click on "Create token".
* copy the token by clicking `Copy`. **You won't be able to see it again.** * copy the token by clicking `Copy`. **You won't be able to see it again.**
* the token looks something like `hf_abCdEfGhijKlM`. * the token looks something like `hf_abCdEfGhijKlM`.
* store the token on your machine (replace the placeholder with your actual * store the token on your machine (replace the placeholder with your actual token):
token):
You can use the `HUGGINGFACE_TOKEN` environment variable to store the token or use You can use the `HUGGINGFACE_TOKEN` environment variable to store the token or use
its standard location: its standard location:
@ -26,7 +23,7 @@ its standard location:
mkdir -p $HOME/.cache/huggingface/; echo <hf_my_token> > "$HOME/.cache/huggingface/token" mkdir -p $HOME/.cache/huggingface/; echo <hf_my_token> > "$HOME/.cache/huggingface/token"
``` ```
Now you're ready to download a gated model like `Meta-Llama-3-8b`! Now you're ready to download a gated model like `Meta-Llama-3.2-1b`!
**Example:** **Example:**
@ -34,8 +31,7 @@ Now you're ready to download a gated model like `Meta-Llama-3-8b`!
# requires token in $HOME/.cache/huggingface/token, as created by the # requires token in $HOME/.cache/huggingface/token, as created by the
# `huggingface-cli login` command, or the `HUGGINGFACE_TOKEN` environment variable. # `huggingface-cli login` command, or the `HUGGINGFACE_TOKEN` environment variable.
cd examples cd examples
bazel run --config=release //llama:Meta-Llama-3-8b bazel run @zml//tools:hf -- download meta-llama/Llama-3.2-1B-Instruct --local-dir $HOME/Llama-3.2-1B-Instruct --exclude='*.pth'
bazel run --config=release //llama:Meta-Llama-3-8b -- --promt="Once upon a time," bazel run --config=release //llama -- --hf-model-path=$HOME/Llama-3.2-1B-Instruct --prompt="What is the capital of France?"
``` ```

View File

@ -75,8 +75,9 @@ Once you've been granted access, you're ready to download a gated model like
# requires token in $HOME/.cache/huggingface/token, as created by the # requires token in $HOME/.cache/huggingface/token, as created by the
# `huggingface-cli login` command, or the `HUGGINGFACE_TOKEN` environment variable. # `huggingface-cli login` command, or the `HUGGINGFACE_TOKEN` environment variable.
cd examples cd examples
bazel run --config=release //llama:Llama-3.1-8B-Instruct bazel run @zml//tools:hf -- download meta-llama/Llama-3.1-8B-Instruct --local-dir $HOME/Llama-3.1-8B-Instruct --exclude='*.pth'
bazel run --config=release //llama:Llama-3.1-8B-Instruct -- --prompt="What is the capital of France?" bazel run --config=release //llama -- --hf-model-path=$HOME/Llama-3.1-8B-Instruct
bazel run --config=release //llama -- --hf-model-path=$HOME/Llama-3.1-8B-Instruct --prompt="What is the capital of France?"
``` ```
You can also try `Llama-3.1-70B-Instruct` if you have enough memory. You can also try `Llama-3.1-70B-Instruct` if you have enough memory.
@ -88,8 +89,9 @@ Like the 8B model above, this model also requires approval. See
``` ```
cd examples cd examples
bazel run --config=release //llama:Llama-3.2-1B-Instruct bazel run @zml//tools:hf -- download meta-llama/Llama-3.2-1B-Instruct --local-dir $HOME/Llama-3.2-1B-Instruct --exclude='*.pth'
bazel run --config=release //llama:Llama-3.2-1B-Instruct -- --prompt="What is the capital of France?" bazel run --config=release //llama -- --hf-model-path=$HOME/Llama-3.2-1B-Instruct
bazel run --config=release //llama -- --hf-model-path=$HOME/Llama-3.2-1B-Instruct --prompt="What is the capital of France?"
``` ```
For a larger 3.2 model, you can also try `Llama-3.2-3B-Instruct`. For a larger 3.2 model, you can also try `Llama-3.2-3B-Instruct`.

File diff suppressed because one or more lines are too long