169a24307c
Migrate workspace and XLA module definitions to Bazel 8, updating MODULE.bazel files, BUILD rules, and related migration patches.
2024-02-12 12:43:23 +00:00
7e6103d876
Upgrade XLA to version 20250122.0-cc075be, switch to nvptx compiler and nvlink with nvjitlink support, add warning for CUDA path in LD_LIBRARY_PATH, and revert the previous CUDA sandbox fix.
2024-02-06 09:31:48 +00:00
b8a0aaee5a
Update tokenizer to handle byte_fallback for Llama3 GPT2 vocab and add a Llama3‑specific normalizer; adjust tinyllama.zig and hostbuffer.zig to use the new tokenization logic.
2024-02-05 15:22:44 +00:00
b643f7bc53
Add Bazel build rule and test for Llama3 tokenizer’s byte fallback and unknown token handling.
2024-02-02 10:25:48 +00:00
5120fe00dc
Update libxev epoll patch to resolve crashes and hangs in epoll and kqueue implementations.
2024-01-29 17:15:11 +00:00
edc2ac26f8
Adjust ROCm runtime sandboxing to hook only the PJRT plugin and make hipblastlt bytecodes optional.
2024-01-26 13:02:23 +00:00
0ce36599da
Update example build config and Llama demo to support the new async epoll backend and zigcoro scheduler.
2024-01-22 12:17:01 +00:00
a7b7ae0180
Fix async hangs by reworking the libxev epoll backend and using callBlocking for PJRT plugin loading, improving performance across async and runtime modules.
2024-01-16 14:13:45 +00:00
434cee3a6c
Fix CUDA and ROCm sandbox discovery, update epoll libxev patch to prevent high CPU usage, enable XLA GPU latency‑hiding scheduler, and upgrade cuDNN to 9.6.0.
2024-01-15 09:41:42 +00:00
5b8e42f9a9
Vendor zigcoro and unify APIs; rework internals for stdx.meta compatibility, add Channel.try_send/try_recv methods, support dynamically sized channels with comptime capacity, and introduce PoolStackAllocator for coroutine stack allocation.
2024-01-11 15:40:15 +00:00
68dbc290e9
zml: revamp scatterSlices
...
Main issue with current `scatter` implementation is that it uses
broadcasting dims of `stablehlo.scatter`.
While nice in theory, the optimizer doesn't handle them well and they
often are unrolled into while loop.
Here I convert the batching dim to extra iotas indices.
2024-01-08 17:55:20 +00:00
83b5e1ec48
fix
...
Before we where using `module.op().writeBytecode(writer)` to compute the
hash of a model
but it crashes on some inputs, notably for unused variables.
So I used the text representation of the mlir.
2024-01-05 16:44:41 +00:00
acc492454f
Add operator name to source locations and introduce QoL enhancements: remove bias from sdpa, support shape literals in gatherSlices, add Shape.outer, Tensor.all, and infer argMax dtype.
2024-01-01 15:31:41 +00:00
223857251d
Update MNIST example to use new operator source locations and reflect recent API changes (sdpa bias removal, gatherSlices shape literals, Shape.outer, Tensor.all, and argMax dtype inference)
2023-12-26 10:45:52 +00:00
5bd7f8aae9
zml: HostBuffer.prettyPrint()
...
Add pretty printing of HostBuffer.
This will be leverage by the debug helper `x.print()`
It can also be used like this: `std.log.info("my buffer: {}",
.{host_buffer.pretty()})`
2023-12-25 13:01:17 +00:00
5ddd034d2c
pjrt: Fix profiler by allowing i64 resource IDs and reserving memory when creating array lists.
2023-12-20 17:18:02 +00:00
7ef87236ce
Rewrite simple transpose as reshape in core ZML modules and raise default profiler event limit to 1,000,000.
2023-12-18 13:56:45 +00:00
8a031bd4c8
Update Llama example to use the simplified transpose implementation and increase default profiler size to 1,000,000 events.
2023-12-15 12:06:42 +00:00
145e60b4dd
workspace: Update LLVM, XLA, StableHLO, and PJRT plugins to latest versions.
2023-12-13 10:10:32 +00:00
6a4a7fb9a1
zml/module.zig: Remove unnecessary optional unwrapping.
2023-12-05 12:27:08 +00:00
37725cdaa6
Update PJRT, runtime, and ZML modules to use per‑target output folders and expose profiler.dumpDataAsJson for JSON profiling output.
2023-12-04 10:38:10 +00:00
22a846de72
Update llama example to use per‑target output folders and call profiler.dumpDataAsJson for testing the new compilation layout.
2023-12-01 16:05:59 +00:00
46fbbf43a2
Update tutorial documentation in write_first_model.md with quick fixes.
2023-11-30 12:14:33 +00:00
737f7cbdee
Add example build runner scripts and config for Zig code completion.
2023-11-21 14:55:34 +00:00
ec37c8f731
Update Bazel build files and helper scripts to integrate the custom build runner for ZLS code completion.
2023-11-20 15:29:01 +00:00
6e4fef8844
zml: Introduce arena allocator in CompilationContext. Expose arena allocator to replace existing allocator, enabling safe allocation for ops without misusing std.BoundedArray. Includes breaking changes to chunkAllowTrailing and split. Upgrade axis_ types to anytype for tag handling and add TODOs for upcoming Tensor API.
2023-11-16 15:11:23 +00:00
57bf667c90
Add struct‑based client creation flags to the Zig PJRT API and update context.autoPlatform to accept a flag struct.
2023-11-13 12:45:17 +00:00
cb6fcbbb1a
Update docs and Zig examples to demonstrate the new client creation flags API.
2023-11-09 12:31:11 +00:00
9f4194ad97
Fix test layer. Add tests to detect silent breakage of testLayer and regression in mapAlloc with zero-size struct fields. Add Python venv directory to .gitignore.
2023-11-06 11:25:57 +00:00
237a877a29
zml: Add support for Llama 3.2 text-only models. Implement transpose over embed_tokens as a replacement for missing lm_head and make lm_head optional for compatibility. Add repositories and executions to Bazel and update README.
2023-11-01 10:16:48 +00:00
1c9749c25e
docs: move image in concepts.md
2023-10-31 10:21:14 +00:00
eb20548241
update instructions
...
following, `prepare` doesn't alloc anymore, `ExeWithWeights` is
`ModuleExe`
2023-10-26 13:56:56 +00:00
27c8309424
async: add intrusive queue
...
all code contributed by @steeve
* add intrusive queue
* change the constructor of Channel with default AsyncThread executor
---------
Co-authored-by: Steeve Morin <steeve@zml.ai>
2023-10-24 14:36:22 +00:00
98b512c495
Implement func.call emission and function caching across MLIR dialects and ZML module/ops, propagating tags and donations.
2023-10-19 17:01:55 +00:00
37de7b9613
Add Llama example showcasing the new func.call emission and function caching behavior.
2023-10-17 11:00:37 +00:00
7d36913b31
Refactor ZML API: move compile, compileFn and related types to exe.zig, update BaseExe allocation and inline caching in compileInternal, and clean up supporting modules (func.zig, meta.zig, signature.zig, cuda.zig, testing.zig, zml.zig).
2023-10-13 16:08:08 +00:00
35395c13f8
Update example programs (benchmark, llama, mnist, simple_layer) to use the new Exe API and reflect BaseExe allocation changes.
2023-10-10 11:12:34 +00:00
3bc6ad98be
Update module.zig to donate all buffers except the token_index buffer for the Llama+Neuron example.
2023-10-06 10:10:56 +00:00
474f76cd75
Enable buffer donation in the Llama example, donating all buffers except the token_index buffer.
2023-10-03 16:32:40 +00:00
5122ca0203
Refactor rope implementation to compute only required offsets, eliminating full cos/sin matrix generation in module, nn, and tensor code.
2023-09-27 11:45:33 +00:00
06865f5876
Update Llama example to use the new direct rope IR implementation.
2023-09-25 10:22:05 +00:00
b5c4fb7c58
zml: fix float8 <-> float32 conversions, support for Tensor.constant(.{}, .{ .f8 = 1.0})
...
Mostly:
* fix float8 <-> float32 conversions
* support for `Tensor.constant(.{}, .{ .f8 = 1.0})`
Misc:
* fix small inconsistencies between different versions of sdpa
* better error message for broadcast
* bazelrc: --config=debug
2023-09-21 11:15:50 +00:00
455bb3877f
runtimes/cuda: obtain NCCL from the pip package, matching XLA behavior.
2023-09-20 17:41:44 +00:00
0d5389ceda
Update CUDA runtime sandboxing and dynamic symbol renaming, switch to pre‑built jax‑cuda‑pjrt plugin, and bump CUDA to 12.6.2 and cuDNN to 9.5.1.
2023-09-14 13:28:25 +00:00
4abdd32f0d
Update llama example BUILD to use jax-cuda-pjrt plugin and bump CUDA (12.6.2) / CuDNN (9.5.1) versions.
2023-09-12 15:40:21 +00:00
c8c99d7d5a
zml/pjrtx: prefer the built‑in stablehlo version when a plugin reports a newer version, ensuring artifact serialization uses the correct stablehlo version.
2023-09-07 17:06:19 +00:00
9505992e00
workspace: log diagnostic message before returning NotFound to aid debugging.
2023-09-04 13:34:37 +00:00
937cdec324
examples/loader: add missing stdx dependency.
2023-08-30 13:03:59 +00:00
aa7fae449e
zml/pjrtx: execute bufferFromHostBuffer on the thread pool to avoid blocking and improve weight loading performance.
2023-08-29 10:28:51 +00:00
c081cb9ad6
zml/platform: increase maximum device limit to support up to 32 devices per platform.
2023-08-24 12:23:07 +00:00