Removed unnecessary batching dimension introduced by recent changes. Converted index outputs from i32 to u32 for token indices. Ensures Llama runs on CUDA and RoCM. Tested on CUDA. |
||
|---|---|---|
| .. | ||
| benchmark | ||
| llama | ||
| loader | ||
| mnist | ||
| simple_layer | ||
| third_party | ||
| tools | ||
| bazel.sh | ||
| BUILD.bazel | ||
| build.zig | ||
| MODULE.bazel | ||
| MODULE.bazel.lock | ||
| platform_mappings | ||
| zls.build.json | ||