Radix/docs/tutorials/getting_started.md


# Getting Started with ZML

In this tutorial, we will install `ZML` and run a few models locally.

## Prerequisites

First, let's checkout the ZML codebase. In a terminal, run:

```
git clone https://github.com/zml/zml.git
cd zml/
```

We use `bazel` to build ZML and its dependencies. We recommend to download it
through `bazelisk`, a version manager for `bazel`.


### Install Bazel:

**macOs:**

```
    brew install bazelisk
```

**Linux:**

```
    curl -L -o /usr/local/bin/bazel 'https://github.com/bazelbuild/bazelisk/releases/download/v1.25.0/bazelisk-linux-amd64'
    chmod +x /usr/local/bin/bazel
```


## Run a pre-packaged model

ZML comes with a variety of model examples. See also our reference implementations in the [examples](https://github.com/zml/zml/tree/master/examples/) folder.

### MNIST

The [classic](https://en.wikipedia.org/wiki/MNIST_database) handwritten digits
recognition task. The model is tasked to recognize a handwritten digit, which
has been converted to a 28x28 pixel monochrome image. `Bazel` will download a
pre-trained model, and the test dataset. The program will load the model,
compile it, and classify a randomly picked example from the test dataset.


On the command line:

```
cd examples
bazel run --config=release //mnist
```

### Llama

Llama is a family of "Large Language Models", trained to generate text, based
on the beginning of a sentence/book/article. This "beginning" is generally
referred to as the "prompt".

#### Meta Llama 3.1 8B

This model has restrictions, see
[here](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct). It **requires
approval from Meta on Huggingface**, which can take a few hours to get granted.

While waiting for approval, you can already
[generate your Huggingface access token](../howtos/huggingface_access_token.md).

Once you've been granted access, you're ready to download a gated model like
`Meta-Llama-3.1-8B-Instruct`!

```
# requires token in $HOME/.cache/huggingface/token, as created by the
# `huggingface-cli login` command, or the `HUGGINGFACE_TOKEN` environment variable.
cd examples
bazel run @zml//tools:hf -- download meta-llama/Llama-3.1-8B-Instruct --local-dir $HOME/Llama-3.1-8B-Instruct --exclude='*.pth'
bazel run --config=release //llama -- --hf-model-path=$HOME/Llama-3.1-8B-Instruct
bazel run --config=release //llama -- --hf-model-path=$HOME/Llama-3.1-8B-Instruct --prompt="What is the capital of France?"
```

You can also try `Llama-3.1-70B-Instruct` if you have enough memory.

### Meta Llama 3.2 1B

Like the 8B model above, this model also requires approval. See
[here](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct) for access requirements.

```
cd examples
bazel run @zml//tools:hf -- download meta-llama/Llama-3.2-1B-Instruct --local-dir $HOME/Llama-3.2-1B-Instruct --exclude='*.pth'
bazel run --config=release //llama -- --hf-model-path=$HOME/Llama-3.2-1B-Instruct
bazel run --config=release //llama -- --hf-model-path=$HOME/Llama-3.2-1B-Instruct --prompt="What is the capital of France?"
```

For a larger 3.2 model, you can also try `Llama-3.2-3B-Instruct`.


## Run Tests

```
bazel test //zml:test
```

## Running Models on GPU / TPU

You can compile models for accelerator runtimes by appending one or more of the
following arguments to the command line when compiling or running a model:

- NVIDIA CUDA: `--@zml//runtimes:cuda=true`
- AMD RoCM: `--@zml//runtimes:rocm=true`
- Google TPU: `--@zml//runtimes:tpu=true`
- AWS Trainium/Inferentia 2: `--@zml//runtimes:neuron=true`
- **AVOID CPU:** `--@zml//runtimes:cpu=false`

The latter, avoiding compilation for CPU, cuts down compilation time.


So, to run the OpenLLama model from above on your host sporting an NVIDIA GPU,
run the following:

```
cd examples
bazel run --config=release //llama:Llama-3.2-1B-Instruct            \
          --@zml//runtimes:cuda=true                      \
          -- --prompt="What is the capital of France?"
```


## Where to go next:

In [Deploying Models on a Server](../howtos/deploy_on_server.md), we show how you can
cross-compile and package for a specific architecture, then deploy and run your
model. Alternatively, you can also [dockerize](../howtos/dockerize_models.md) your
model.

You might also want to check out the
[examples](https://github.com/zml/zml/tree/master/examples), read through the
[documentation](../README.md), start
[writing your first model](../tutorials/write_first_model.md), or read about more
high-level [ZML concepts](../learn/concepts.md).
Add initial documentation and example projects for ZML, covering how‑to guides, tutorials, and benchmark examples. 2023-01-03 10:21:07 +00:00
			`# Getting Started with ZML`

			In this tutorial, we will install `ZML` and run a few models locally.

			`## Prerequisites`

			`First, let's checkout the ZML codebase. In a terminal, run:`

			```
			`git clone https://github.com/zml/zml.git`
			`cd zml/`
			```

			We use `bazel` to build ZML and its dependencies. We recommend to download it
			through `bazelisk`, a version manager for `bazel`.


			`### Install Bazel:`

			`macOs:`

			```
			`brew install bazelisk`
			```

			`Linux:`

			```
Update getting_started tutorial and example Bazel files for Bazel 8 migration. 2024-02-14 10:44:47 +00:00			`curl -L -o /usr/local/bin/bazel 'https://github.com/bazelbuild/bazelisk/releases/download/v1.25.0/bazelisk-linux-amd64'`
Add initial documentation and example projects for ZML, covering how‑to guides, tutorials, and benchmark examples. 2023-01-03 10:21:07 +00:00			`chmod +x /usr/local/bin/bazel`
			```



			`## Run a pre-packaged model`

			`ZML comes with a variety of model examples. See also our reference implementations in the [examples](https://github.com/zml/zml/tree/master/examples/) folder.`

			`### MNIST`

			`The [classic](https://en.wikipedia.org/wiki/MNIST_database) handwritten digits`
			`recognition task. The model is tasked to recognize a handwritten digit, which`
			has been converted to a 28x28 pixel monochrome image. `Bazel` will download a
			`pre-trained model, and the test dataset. The program will load the model,`
			`compile it, and classify a randomly picked example from the test dataset.`


			`On the command line:`

			```
			`cd examples`
Add/refresh how‑to docs and example loader for deployment, Docker, HuggingFace token, and getting‑started tutorials. 2024-10-14 11:27:41 +00:00			`bazel run --config=release //mnist`
Add initial documentation and example projects for ZML, covering how‑to guides, tutorials, and benchmark examples. 2023-01-03 10:21:07 +00:00			```

			`### Llama`

			`Llama is a family of "Large Language Models", trained to generate text, based`
			`on the beginning of a sentence/book/article. This "beginning" is generally`
			`referred to as the "prompt".`

Update Llama example docs and Bazel build files, and add tests for the new HuggingFace tokenizer integration. 2024-03-04 12:11:13 +00:00			`#### Meta Llama 3.1 8B`
Add initial documentation and example projects for ZML, covering how‑to guides, tutorials, and benchmark examples. 2023-01-03 10:21:07 +00:00
			`This model has restrictions, see`
Update Llama example docs and Bazel build files, and add tests for the new HuggingFace tokenizer integration. 2024-03-04 12:11:13 +00:00			`[here](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct). It **requires`
Add initial documentation and example projects for ZML, covering how‑to guides, tutorials, and benchmark examples. 2023-01-03 10:21:07 +00:00			`approval from Meta on Huggingface**, which can take a few hours to get granted.`

Clarify HuggingFace token handling in workspace, noting the standard CLI location and adding support for an environment variable. 2023-03-14 15:28:03 +00:00			`While waiting for approval, you can already`
Add initial documentation and example projects for ZML, covering how‑to guides, tutorials, and benchmark examples. 2023-01-03 10:21:07 +00:00			`[generate your Huggingface access token](../howtos/huggingface_access_token.md).`

			`Once you've been granted access, you're ready to download a gated model like`
Update Llama example docs and Bazel build files, and add tests for the new HuggingFace tokenizer integration. 2024-03-04 12:11:13 +00:00			`Meta-Llama-3.1-8B-Instruct`!
Add initial documentation and example projects for ZML, covering how‑to guides, tutorials, and benchmark examples. 2023-01-03 10:21:07 +00:00
			```
Clarify HuggingFace token handling in workspace, noting the standard CLI location and adding support for an environment variable. 2023-03-14 15:28:03 +00:00			`# requires token in $HOME/.cache/huggingface/token, as created by the`
			# `huggingface-cli login` command, or the `HUGGINGFACE_TOKEN` environment variable.
Add initial documentation and example projects for ZML, covering how‑to guides, tutorials, and benchmark examples. 2023-01-03 10:21:07 +00:00			`cd examples`
Update docs: add deployment guide, Hugging Face token instructions, getting‑started tutorial, and include a Bazel lock example. 2025-06-05 13:18:14 +00:00			`bazel run @zml//tools:hf -- download meta-llama/Llama-3.1-8B-Instruct --local-dir $HOME/Llama-3.1-8B-Instruct --exclude='*.pth'`
			`bazel run --config=release //llama -- --hf-model-path=$HOME/Llama-3.1-8B-Instruct`
			`bazel run --config=release //llama -- --hf-model-path=$HOME/Llama-3.1-8B-Instruct --prompt="What is the capital of France?"`
Add initial documentation and example projects for ZML, covering how‑to guides, tutorials, and benchmark examples. 2023-01-03 10:21:07 +00:00			```

Update Llama example docs and Bazel build files, and add tests for the new HuggingFace tokenizer integration. 2024-03-04 12:11:13 +00:00			You can also try `Llama-3.1-70B-Instruct` if you have enough memory.

			`### Meta Llama 3.2 1B`

			`Like the 8B model above, this model also requires approval. See`
			`[here](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct) for access requirements.`

			```
			`cd examples`
Update docs: add deployment guide, Hugging Face token instructions, getting‑started tutorial, and include a Bazel lock example. 2025-06-05 13:18:14 +00:00			`bazel run @zml//tools:hf -- download meta-llama/Llama-3.2-1B-Instruct --local-dir $HOME/Llama-3.2-1B-Instruct --exclude='*.pth'`
			`bazel run --config=release //llama -- --hf-model-path=$HOME/Llama-3.2-1B-Instruct`
			`bazel run --config=release //llama -- --hf-model-path=$HOME/Llama-3.2-1B-Instruct --prompt="What is the capital of France?"`
Update Llama example docs and Bazel build files, and add tests for the new HuggingFace tokenizer integration. 2024-03-04 12:11:13 +00:00			```

			For a larger 3.2 model, you can also try `Llama-3.2-3B-Instruct`.

Add initial documentation and example projects for ZML, covering how‑to guides, tutorials, and benchmark examples. 2023-01-03 10:21:07 +00:00
			`## Run Tests`

			```
			`bazel test //zml:test`
			```

			`## Running Models on GPU / TPU`

			`You can compile models for accelerator runtimes by appending one or more of the`
			`following arguments to the command line when compiling or running a model:`

			- NVIDIA CUDA: `--@zml//runtimes:cuda=true`
			- AMD RoCM: `--@zml//runtimes:rocm=true`
			- Google TPU: `--@zml//runtimes:tpu=true`
Update docs (deploy_on_server, dockerize_models, getting_started) and example Bazel files to include AWS Neuron/Trainium/Inferentia deployment guidance. 2023-08-21 09:15:48 +00:00			- AWS Trainium/Inferentia 2: `--@zml//runtimes:neuron=true`
Add initial documentation and example projects for ZML, covering how‑to guides, tutorials, and benchmark examples. 2023-01-03 10:21:07 +00:00			- AVOID CPU: `--@zml//runtimes:cpu=false`

			`The latter, avoiding compilation for CPU, cuts down compilation time.`


			`So, to run the OpenLLama model from above on your host sporting an NVIDIA GPU,`
			`run the following:`

			```
			`cd examples`
Add/refresh how‑to docs and example loader for deployment, Docker, HuggingFace token, and getting‑started tutorials. 2024-10-14 11:27:41 +00:00			`bazel run --config=release //llama:Llama-3.2-1B-Instruct \`
Update Llama example docs and Bazel build files, and add tests for the new HuggingFace tokenizer integration. 2024-03-04 12:11:13 +00:00			`--@zml//runtimes:cuda=true \`
			`-- --prompt="What is the capital of France?"`
Add initial documentation and example projects for ZML, covering how‑to guides, tutorials, and benchmark examples. 2023-01-03 10:21:07 +00:00			```


			`## Where to go next:`

			`In [Deploying Models on a Server](../howtos/deploy_on_server.md), we show how you can`
			`cross-compile and package for a specific architecture, then deploy and run your`
Clarify HuggingFace token handling in workspace, noting the standard CLI location and adding support for an environment variable. 2023-03-14 15:28:03 +00:00			`model. Alternatively, you can also [dockerize](../howtos/dockerize_models.md) your`
Add initial documentation and example projects for ZML, covering how‑to guides, tutorials, and benchmark examples. 2023-01-03 10:21:07 +00:00			`model.`

			`You might also want to check out the`
			`[examples](https://github.com/zml/zml/tree/master/examples), read through the`
Clarify HuggingFace token handling in workspace, noting the standard CLI location and adding support for an environment variable. 2023-03-14 15:28:03 +00:00			`[documentation](../README.md), start`
			`[writing your first model](../tutorials/write_first_model.md), or read about more`
Add initial documentation and example projects for ZML, covering how‑to guides, tutorials, and benchmark examples. 2023-01-03 10:21:07 +00:00			`high-level [ZML concepts](../learn/concepts.md).`