474f76cd75
Enable buffer donation in the Llama example, donating all buffers except the token_index buffer.
2023-10-03 16:32:40 +00:00
5122ca0203
Refactor rope implementation to compute only required offsets, eliminating full cos/sin matrix generation in module, nn, and tensor code.
2023-09-27 11:45:33 +00:00
06865f5876
Update Llama example to use the new direct rope IR implementation.
2023-09-25 10:22:05 +00:00
b5c4fb7c58
zml: fix float8 <-> float32 conversions, support for Tensor.constant(.{}, .{ .f8 = 1.0})
...
Mostly:
* fix float8 <-> float32 conversions
* support for `Tensor.constant(.{}, .{ .f8 = 1.0})`
Misc:
* fix small inconsistencies between different versions of sdpa
* better error message for broadcast
* bazelrc: --config=debug
2023-09-21 11:15:50 +00:00
455bb3877f
runtimes/cuda: obtain NCCL from the pip package, matching XLA behavior.
2023-09-20 17:41:44 +00:00
0d5389ceda
Update CUDA runtime sandboxing and dynamic symbol renaming, switch to pre‑built jax‑cuda‑pjrt plugin, and bump CUDA to 12.6.2 and cuDNN to 9.5.1.
2023-09-14 13:28:25 +00:00
4abdd32f0d
Update llama example BUILD to use jax-cuda-pjrt plugin and bump CUDA (12.6.2) / CuDNN (9.5.1) versions.
2023-09-12 15:40:21 +00:00
c8c99d7d5a
zml/pjrtx: prefer the built‑in stablehlo version when a plugin reports a newer version, ensuring artifact serialization uses the correct stablehlo version.
2023-09-07 17:06:19 +00:00
9505992e00
workspace: log diagnostic message before returning NotFound to aid debugging.
2023-09-04 13:34:37 +00:00
937cdec324
examples/loader: add missing stdx dependency.
2023-08-30 13:03:59 +00:00
aa7fae449e
zml/pjrtx: execute bufferFromHostBuffer on the thread pool to avoid blocking and improve weight loading performance.
2023-08-29 10:28:51 +00:00
c081cb9ad6
zml/platform: increase maximum device limit to support up to 32 devices per platform.
2023-08-24 12:23:07 +00:00
af0630616c
Update docs (deploy_on_server, dockerize_models, getting_started) and example Bazel files to include AWS Neuron/Trainium/Inferentia deployment guidance.
2023-08-21 09:15:48 +00:00
7d24329d0a
Add Bazel build rules and runtime implementation for AWS Neuron/Trainium/Inferentia support.
2023-08-18 17:11:27 +00:00
0709b1b32f
zml: reduce memory usage of sdpaMemEfficient by using zml.ops.while instead of zml.ops.for, avoiding concatenation of intermediate results.
2023-08-14 14:24:11 +00:00
022baf782b
Update examples/MODULE.bazel to reference the bumped LLVM, XLA, StableHLO, and PJRT plugin versions.
2023-08-11 16:57:15 +00:00
01eff33fa0
Update workspace dependencies to newer LLVM, XLA, StableHLO, and PJRT versions and expose new pjrt plugin attribute APIs and stablehlo version APIs in build and runtime configurations.
2023-08-07 12:28:36 +00:00
726a2d0691
Update docs and examples to showcase the new async runtime with coroutines and cross‑thread signaling.
2023-08-03 11:35:24 +00:00
bcde3962ce
Rework async runtime with coroutine support, rename async API (async_→asyncc, await_→awaitt), improve type inference, bump libxev (default epoll) and update related stdx and zml modules.
2023-08-01 11:35:04 +00:00
b53462b515
Fix crash in for_ by ensuring values are pushed to their block before opening a new block, adding asserts for block state, and guaranteeing first_step is used. Adjust padding syntax to improve usability.
2023-07-25 14:25:47 +00:00
0fa258cd88
Update examples to reflect recent async module changes, renaming asyncGeneric to asyncc.
2023-07-24 09:34:35 +00:00
f675a203c2
zml.ops.makeBlock now returns the inner tensor to propagate tags. The function returns both the created mlir.Block and tensors from the supplied function, allowing shape and tag propagation without exposing mlir.Values. Updated tests to run on non‑CPU platforms.
2023-07-21 09:01:01 +00:00
be8aa4fa8e
Fix several compileError calls introduced by recent changes; ensure Zig compiler catches errors at comptime.
2023-07-17 09:10:27 +00:00
0f9a92f27d
module-cache: raise max_pjrt_executable_size limit to 400 MB to accommodate large PJRT executables.
2023-07-14 17:58:22 +00:00
88c7a74ccf
third_party/modules/zig-protobuf: revert indentation changes to maintain compatibility with older branches.
2023-07-13 11:44:53 +00:00
4681ce2f24
PJRT: add conversion of profiling protobuf output to JSON format.
2023-07-05 13:34:05 +00:00
f7bac1af10
Update example programs (llama and loader) with hotfixes for issue.
2023-07-04 13:40:05 +00:00
63aca9f9c2
Hotfixes for build rule, math utilities, module system, and NN implementation (fixes,)
2023-06-29 10:26:54 +00:00
7985716562
Add new Zig example programs (benchmark, llama, loader, mnist, simple_layer) and include a test for the llama example.
2023-06-27 14:23:22 +00:00
9b7eea8ac2
Add stdx utilities and rework async signature inference; tidy executable logging.
2023-06-21 14:45:14 +00:00
c30aa018dc
zml: small cleanup
...
- Add more scatterSlices test cases.
- Replace helpers.mapTensors with zml.meta.map.
- Fix shape handling when a for loop is fully unrolled.
- Allow zml.Tensor.pad to accept i64 for dimension compatibility.
- Enable arrays of tensors inside model structs.
- Split Buffer.asViewOf into asViewOfHostBuffer and asViewOfDeviceBuffer.
2023-06-19 15:29:29 +00:00
f00538667e
zml.nn: add dynamic sampling with support for top‑k, top‑p, and min‑p settings. Implements token index computation based on the selected sampling strategy, including options for top_k, max_top_k, top_p, and min_p.
2023-06-16 14:34:18 +00:00
b244a18621
zml: set iota default dtype to .i32, with fallback to .i64 for axes with many elements, simplifying usage.
2023-06-15 12:45:52 +00:00
344e07fb6e
stablehlo: extend dot_general API to include DotAlgorithm support by merging precision and algorithm attributes into a union, aligning with spec requirements. Currently not exposed to users due to limited algorithm support.
2023-06-07 11:20:25 +00:00
6d720126ac
Add PJRT custom call integration with generic zmlHostBufferCallback to copy tensors to host and invoke user callbacks. Introduce Tensor.print() method to output runtime tensor values (CUDA‑specific, uses a pre‑allocated host buffer).
2023-06-05 13:42:45 +00:00
bf23eef0d9
examples: clean up inconsistencies in asynk usage across the codebase.
2023-06-01 16:11:58 +00:00
499b0d20e5
pjrtx: change behavior to return an error when OpenXLA fails to serialize the new batching_dim attribute for gather/scatter, instead of panicking.
2023-05-29 17:18:19 +00:00
52ef20f981
zml: reintroduce pjrtx to handle reactor blocking issues in async scenarios, particularly with Events.
2023-05-26 15:54:15 +00:00
c68ec4bc5c
async: implement default threaded backend using a thread pool. Backend selectable via @zml//async:impl flag (threaded or zigcoro). Provides workaround for environments where io_uring is unavailable.
2023-05-25 16:02:11 +00:00
89cf2233d3
zml/aio: enable reading metadata from index.json for sharded safetensor files, allowing metadata storage alongside model config.
2023-05-23 15:06:59 +00:00
2f54e2a5f3
zml.tensor: add triangular operator to zero out the upper‑right matrix region with configurable offset, and toDiagonal (diag_embed) to embed a vector as a diagonal matrix, correcting previous diag naming. Also add ELU activation under zml.nn.Activation.
2023-05-18 16:39:21 +00:00
05faa5021e
zml.tensor: add cumulativeSum operator and refactor maxPoolND. Introduce cumulative sum using reduceWindow. Simplify reduceWindow signature by merging padding_shape and padding_value. Update maxPool1D/2D to accept tuple arguments. Revise pad to use tagged or AOS syntax; remove SOA syntax.
2023-05-17 09:01:27 +00:00
54e7eb30b4
Introduce a thin abstraction layer between ZML and PJRT to manage plugin loading decisions, enable compile‑time detection of linked runtimes, and handle cases such as libtpu blocking metadata access.
2023-05-15 09:36:41 +00:00
74e90855ca
Configure the runfiles environment globally at context start to ensure Bazel-built binaries locate their runfiles correctly.
2023-05-12 11:40:23 +00:00
57130577e9
Add fallback for runtimes lacking PJRT_Event by using thread‑pool dispatch for buffer copies and treating operations as synchronous when events are absent.
2023-05-09 12:44:56 +00:00
672df8fa2f
Update tutorial and example code to use the new asyncc name and Generic slugs.
2023-05-08 16:58:45 +00:00
5543c8192f
Rename async_ to asyncc and add Generic async slugs in async.zig, aio.zig, and module.zig.
2023-05-04 14:44:12 +00:00
cfe38f27ca
Switch ROCm dlopen handling to patchelf's rename_dynamic_symbols for more robust dynamic symbol import.
2023-05-03 17:33:46 +00:00
fefd84b1bb
Replace silu implementation with stablehlo.logistic for higher precision, move logistic logic into sigmoid and alias logistic to sigmoid (breaking change).
2023-05-01 10:40:50 +00:00
021111d07d
Extend tests to handle all float types, preventing crashes with bfloat16 tensors.
2023-04-27 10:34:27 +00:00