95453c7242
Update XLA dependency to version 20250527.0‑cb67f2f and refresh related Bazel BUILD, MODULE, overlay and patch files.
2024-11-22 16:50:20 +00:00
3849eb10b7
Add buffer and hostbuffer utilities with precise f32→bf16 conversion, type inference for loadBuffers, store expected input shapes, enhance meta.visit and JSON TaggedUnion support, and improve logging.
2024-10-28 11:21:46 +00:00
4ef81b89ea
stdx.fmt: add slice formatting support, improving on previous prettyPrinter implementation by leveraging internal fmt mechanisms.
2024-10-18 15:05:08 +00:00
aacbf2ee04
Fix Llama3 rope scaling implementation in the neural network module (zml/nn.zig)
2024-10-07 12:53:03 +00:00
2863c1f5e0
zml/tensor: fix returned value in Tensor.toMemory – ensure _output_memory_kind is set correctly in the result.
2024-09-18 13:18:08 +00:00
aec7072837
pjrt: add FFI bindings for custom calls
2024-09-10 09:14:28 +00:00
1f5ff96c10
zml/ops: add wiring for operand output alias in zml.ops.triton
2024-09-09 15:00:28 +00:00
4b7e618b43
zml/aio: add bool handling in struct population within populateStruct
2024-09-02 14:11:47 +00:00
ac63c30e12
add mini-DSL for creating MLIR common attributes and types, leveraging Zig 0.14 to simplify mlir.Type and mlir.Attribute creation
2024-08-26 14:19:00 +00:00
63ef78efcc
zml: add support for NVTX tracing
2024-08-21 14:41:40 +00:00
7df89301dc
Bump XLA version and import llvm, stablehlo, triton, and zig‑protobuf modules in workspace BUILD files.
2024-08-06 10:28:43 +00:00
3f36506f1c
zml: remove usingnamespace from floats.zig and related dependencies; note that incremental compilation does not improve overall build time due to linking overhead
2024-07-23 17:43:43 +00:00
42dee5d0e0
mlir: rework stablehlo custom call implementation and add a Triton example
2024-07-16 13:23:07 +00:00
aec1d96e6d
mlir: rework DenseElementsAttribute to correctly slice inputs and modify .as() to return a concrete value instead of an optional
2024-07-15 12:32:24 +00:00
30f6be0e2f
Update core Zig modules (async, mlir, pjrt, stdx) and third‑party Bazel definitions for the Zig 0.14.0 release.
2024-07-02 14:19:04 +00:00
18eb0e5a7b
Add async I/O, SentencePiece, NN, and tensor utilities for ModernBERT support and update Bazel build configuration.
2024-06-14 15:27:06 +00:00
221ece647d
zml/ops.zig: Added zml.ops.case operation
...
This can be used to select which branch will be run at runtime.
It wraps the `stablehlo.case` operation.
2024-05-30 14:11:08 +00:00
3aac788544
Update Bazel build configurations (zig.bzl, BUILD files) for MLIR, PJRT, Neuron, ROCm, tokenizer, and tools, fixing broken dependencies.
2024-05-20 11:28:25 +00:00
05944b5cc9
Update FnCache to copy and reuse non‑tensor fields in fixed‑size structs, preventing undefined memory in core modules.
2024-05-15 17:54:52 +00:00
a34190679b
Fix llama token handling and remove redundant prompt token reuse in core Zig modules (aio, module, nn, pjrtx, tensor)
2024-05-02 17:10:11 +00:00
13eff4e661
pjrt,zml: add memory bindings
...
This preliminary PR binds PJRT memory endpoints and adds them to
`zml.Buffer`.
A follow up PR will properly integrate it inside `zml.Buffer`
2024-04-11 15:43:24 +00:00
d4db5ccc6b
Integrate TinyLlama support, restore the homemade tokenizer, and align Zig API naming across stdx and zml tokenizer modules.
2024-04-05 15:07:29 +00:00
8a25b1eb74
Revert CUDA PJRT plugin version to 0.4.38 to address performance regression on XLA master.
2024-03-05 17:04:42 +00:00
959bc48c42
Add HuggingFace tokenizer bindings and SentencePiece integration; update BUILD files, async utilities, and FFI modules to support the new tokenizers.
2024-02-28 15:47:37 +00:00
c109b12e1b
Various minor fixes: rewrite tinyllama tokenizer newline token, prevent HostBuffer.isContiguous false trigger on 1‑dim axes, improve HostBuffer.slice1d error messages, simplify module.zig output to show .mlir file path, correct setFlags handling of comptime int/float, make tokenizer.zig return <oob> for out‑of‑range detokenization, and speed up Buffer.constant creation up to 2.5 GB/s on CUDA.
2024-02-19 12:34:18 +00:00
169a24307c
Migrate workspace and XLA module definitions to Bazel 8, updating MODULE.bazel files, BUILD rules, and related migration patches.
2024-02-12 12:43:23 +00:00
7e6103d876
Upgrade XLA to version 20250122.0-cc075be, switch to nvptx compiler and nvlink with nvjitlink support, add warning for CUDA path in LD_LIBRARY_PATH, and revert the previous CUDA sandbox fix.
2024-02-06 09:31:48 +00:00
b8a0aaee5a
Update tokenizer to handle byte_fallback for Llama3 GPT2 vocab and add a Llama3‑specific normalizer; adjust tinyllama.zig and hostbuffer.zig to use the new tokenization logic.
2024-02-05 15:22:44 +00:00
a7b7ae0180
Fix async hangs by reworking the libxev epoll backend and using callBlocking for PJRT plugin loading, improving performance across async and runtime modules.
2024-01-16 14:13:45 +00:00
434cee3a6c
Fix CUDA and ROCm sandbox discovery, update epoll libxev patch to prevent high CPU usage, enable XLA GPU latency‑hiding scheduler, and upgrade cuDNN to 9.6.0.
2024-01-15 09:41:42 +00:00
68dbc290e9
zml: revamp scatterSlices
...
Main issue with current `scatter` implementation is that it uses
broadcasting dims of `stablehlo.scatter`.
While nice in theory, the optimizer doesn't handle them well and they
often are unrolled into while loop.
Here I convert the batching dim to extra iotas indices.
2024-01-08 17:55:20 +00:00
83b5e1ec48
fix
...
Before we where using `module.op().writeBytecode(writer)` to compute the
hash of a model
but it crashes on some inputs, notably for unused variables.
So I used the text representation of the mlir.
2024-01-05 16:44:41 +00:00
acc492454f
Add operator name to source locations and introduce QoL enhancements: remove bias from sdpa, support shape literals in gatherSlices, add Shape.outer, Tensor.all, and infer argMax dtype.
2024-01-01 15:31:41 +00:00
5bd7f8aae9
zml: HostBuffer.prettyPrint()
...
Add pretty printing of HostBuffer.
This will be leverage by the debug helper `x.print()`
It can also be used like this: `std.log.info("my buffer: {}",
.{host_buffer.pretty()})`
2023-12-25 13:01:17 +00:00
7ef87236ce
Rewrite simple transpose as reshape in core ZML modules and raise default profiler event limit to 1,000,000.
2023-12-18 13:56:45 +00:00
145e60b4dd
workspace: Update LLVM, XLA, StableHLO, and PJRT plugins to latest versions.
2023-12-13 10:10:32 +00:00
6a4a7fb9a1
zml/module.zig: Remove unnecessary optional unwrapping.
2023-12-05 12:27:08 +00:00
37725cdaa6
Update PJRT, runtime, and ZML modules to use per‑target output folders and expose profiler.dumpDataAsJson for JSON profiling output.
2023-12-04 10:38:10 +00:00
6e4fef8844
zml: Introduce arena allocator in CompilationContext. Expose arena allocator to replace existing allocator, enabling safe allocation for ops without misusing std.BoundedArray. Includes breaking changes to chunkAllowTrailing and split. Upgrade axis_ types to anytype for tag handling and add TODOs for upcoming Tensor API.
2023-11-16 15:11:23 +00:00
57bf667c90
Add struct‑based client creation flags to the Zig PJRT API and update context.autoPlatform to accept a flag struct.
2023-11-13 12:45:17 +00:00
9f4194ad97
Fix test layer. Add tests to detect silent breakage of testLayer and regression in mapAlloc with zero-size struct fields. Add Python venv directory to .gitignore.
2023-11-06 11:25:57 +00:00
98b512c495
Implement func.call emission and function caching across MLIR dialects and ZML module/ops, propagating tags and donations.
2023-10-19 17:01:55 +00:00
7d36913b31
Refactor ZML API: move compile, compileFn and related types to exe.zig, update BaseExe allocation and inline caching in compileInternal, and clean up supporting modules (func.zig, meta.zig, signature.zig, cuda.zig, testing.zig, zml.zig).
2023-10-13 16:08:08 +00:00
3bc6ad98be
Update module.zig to donate all buffers except the token_index buffer for the Llama+Neuron example.
2023-10-06 10:10:56 +00:00
5122ca0203
Refactor rope implementation to compute only required offsets, eliminating full cos/sin matrix generation in module, nn, and tensor code.
2023-09-27 11:45:33 +00:00
b5c4fb7c58
zml: fix float8 <-> float32 conversions, support for Tensor.constant(.{}, .{ .f8 = 1.0})
...
Mostly:
* fix float8 <-> float32 conversions
* support for `Tensor.constant(.{}, .{ .f8 = 1.0})`
Misc:
* fix small inconsistencies between different versions of sdpa
* better error message for broadcast
* bazelrc: --config=debug
2023-09-21 11:15:50 +00:00
0d5389ceda
Update CUDA runtime sandboxing and dynamic symbol renaming, switch to pre‑built jax‑cuda‑pjrt plugin, and bump CUDA to 12.6.2 and cuDNN to 9.5.1.
2023-09-14 13:28:25 +00:00
c8c99d7d5a
zml/pjrtx: prefer the built‑in stablehlo version when a plugin reports a newer version, ensuring artifact serialization uses the correct stablehlo version.
2023-09-07 17:06:19 +00:00
9505992e00
workspace: log diagnostic message before returning NotFound to aid debugging.
2023-09-04 13:34:37 +00:00
aa7fae449e
zml/pjrtx: execute bufferFromHostBuffer on the thread pool to avoid blocking and improve weight loading performance.
2023-08-29 10:28:51 +00:00
c081cb9ad6
zml/platform: increase maximum device limit to support up to 32 devices per platform.
2023-08-24 12:23:07 +00:00
7d24329d0a
Add Bazel build rules and runtime implementation for AWS Neuron/Trainium/Inferentia support.
2023-08-18 17:11:27 +00:00
0709b1b32f
zml: reduce memory usage of sdpaMemEfficient by using zml.ops.while instead of zml.ops.for, avoiding concatenation of intermediate results.
2023-08-14 14:24:11 +00:00
01eff33fa0
Update workspace dependencies to newer LLVM, XLA, StableHLO, and PJRT versions and expose new pjrt plugin attribute APIs and stablehlo version APIs in build and runtime configurations.
2023-08-07 12:28:36 +00:00
bcde3962ce
Rework async runtime with coroutine support, rename async API (async_→asyncc, await_→awaitt), improve type inference, bump libxev (default epoll) and update related stdx and zml modules.
2023-08-01 11:35:04 +00:00
b53462b515
Fix crash in for_ by ensuring values are pushed to their block before opening a new block, adding asserts for block state, and guaranteeing first_step is used. Adjust padding syntax to improve usability.
2023-07-25 14:25:47 +00:00
f675a203c2
zml.ops.makeBlock now returns the inner tensor to propagate tags. The function returns both the created mlir.Block and tensors from the supplied function, allowing shape and tag propagation without exposing mlir.Values. Updated tests to run on non‑CPU platforms.
2023-07-21 09:01:01 +00:00
be8aa4fa8e
Fix several compileError calls introduced by recent changes; ensure Zig compiler catches errors at comptime.
2023-07-17 09:10:27 +00:00
0f9a92f27d
module-cache: raise max_pjrt_executable_size limit to 400 MB to accommodate large PJRT executables.
2023-07-14 17:58:22 +00:00
63aca9f9c2
Hotfixes for build rule, math utilities, module system, and NN implementation (fixes,)
2023-06-29 10:26:54 +00:00
9b7eea8ac2
Add stdx utilities and rework async signature inference; tidy executable logging.
2023-06-21 14:45:14 +00:00
c30aa018dc
zml: small cleanup
...
- Add more scatterSlices test cases.
- Replace helpers.mapTensors with zml.meta.map.
- Fix shape handling when a for loop is fully unrolled.
- Allow zml.Tensor.pad to accept i64 for dimension compatibility.
- Enable arrays of tensors inside model structs.
- Split Buffer.asViewOf into asViewOfHostBuffer and asViewOfDeviceBuffer.
2023-06-19 15:29:29 +00:00
f00538667e
zml.nn: add dynamic sampling with support for top‑k, top‑p, and min‑p settings. Implements token index computation based on the selected sampling strategy, including options for top_k, max_top_k, top_p, and min_p.
2023-06-16 14:34:18 +00:00
b244a18621
zml: set iota default dtype to .i32, with fallback to .i64 for axes with many elements, simplifying usage.
2023-06-15 12:45:52 +00:00
344e07fb6e
stablehlo: extend dot_general API to include DotAlgorithm support by merging precision and algorithm attributes into a union, aligning with spec requirements. Currently not exposed to users due to limited algorithm support.
2023-06-07 11:20:25 +00:00
6d720126ac
Add PJRT custom call integration with generic zmlHostBufferCallback to copy tensors to host and invoke user callbacks. Introduce Tensor.print() method to output runtime tensor values (CUDA‑specific, uses a pre‑allocated host buffer).
2023-06-05 13:42:45 +00:00
499b0d20e5
pjrtx: change behavior to return an error when OpenXLA fails to serialize the new batching_dim attribute for gather/scatter, instead of panicking.
2023-05-29 17:18:19 +00:00
52ef20f981
zml: reintroduce pjrtx to handle reactor blocking issues in async scenarios, particularly with Events.
2023-05-26 15:54:15 +00:00
c68ec4bc5c
async: implement default threaded backend using a thread pool. Backend selectable via @zml//async:impl flag (threaded or zigcoro). Provides workaround for environments where io_uring is unavailable.
2023-05-25 16:02:11 +00:00
89cf2233d3
zml/aio: enable reading metadata from index.json for sharded safetensor files, allowing metadata storage alongside model config.
2023-05-23 15:06:59 +00:00
2f54e2a5f3
zml.tensor: add triangular operator to zero out the upper‑right matrix region with configurable offset, and toDiagonal (diag_embed) to embed a vector as a diagonal matrix, correcting previous diag naming. Also add ELU activation under zml.nn.Activation.
2023-05-18 16:39:21 +00:00
05faa5021e
zml.tensor: add cumulativeSum operator and refactor maxPoolND. Introduce cumulative sum using reduceWindow. Simplify reduceWindow signature by merging padding_shape and padding_value. Update maxPool1D/2D to accept tuple arguments. Revise pad to use tagged or AOS syntax; remove SOA syntax.
2023-05-17 09:01:27 +00:00
54e7eb30b4
Introduce a thin abstraction layer between ZML and PJRT to manage plugin loading decisions, enable compile‑time detection of linked runtimes, and handle cases such as libtpu blocking metadata access.
2023-05-15 09:36:41 +00:00
74e90855ca
Configure the runfiles environment globally at context start to ensure Bazel-built binaries locate their runfiles correctly.
2023-05-12 11:40:23 +00:00
57130577e9
Add fallback for runtimes lacking PJRT_Event by using thread‑pool dispatch for buffer copies and treating operations as synchronous when events are absent.
2023-05-09 12:44:56 +00:00
5543c8192f
Rename async_ to asyncc and add Generic async slugs in async.zig, aio.zig, and module.zig.
2023-05-04 14:44:12 +00:00
fefd84b1bb
Replace silu implementation with stablehlo.logistic for higher precision, move logistic logic into sigmoid and alias logistic to sigmoid (breaking change).
2023-05-01 10:40:50 +00:00
021111d07d
Extend tests to handle all float types, preventing crashes with bfloat16 tensors.
2023-04-27 10:34:27 +00:00
ed6444b775
Add Tensor.concatenate support, begin deprecating broadcastLeft, and compute transformer head scaling constant in f32 for higher precision.
2023-04-21 15:55:07 +00:00
11006ca08d
Refactor torch module: merge PickleData into Parser as torch.File, rename value file to py_object.zig, use buffered reader for pickle and zip headers, adjust intermediate result handling, simplify Python dict representation, separate kwargs from args, and add extensive tests for long integers, protocol 0, zipped pickle, and a complex PyTorch Conv2d case; also streamline BufferStore initialization.
2023-04-20 15:43:18 +00:00
8e43a45a3c
Add event waiting when invoking a module and improve multi‑device sharding handling.
2023-04-11 11:32:09 +00:00
0189b71070
Rename zml.aio.Value to zml.aio.Metadata, simplify its type variants, and update torch pickle/eval APIs accordingly.
2023-04-07 16:45:58 +00:00
e25f70d923
Rename and simplify modules in zml/aio/torch: replace redundant qualified names, remove generic utilities, inline code, reorder functions for top‑to‑bottom readability, and extract parsing logic into parseTensor and parseStorage functions.
2023-04-04 17:20:53 +00:00
66881899ca
Fix testLayer by removing unnecessary compile_options argument and updating testing logic for new sharded output, ensuring proper usage by llama.zig.
2023-03-31 14:23:45 +00:00
05d23beb23
Add Normalizer.fromHfJson to read HuggingFace tokenizer JSON and map to internal options, including a configurable magic space token and a debug flag for token merges. Adjust default handling of extra whitespaces to align with HF defaults.
2023-03-29 16:10:29 +00:00
ef922e3aea
Fix empty JSON array handling in safetensor metadata loader and refactor torch loader (make ops slices const and improve readability).
2023-03-28 16:17:00 +00:00
a4f0fc96c0
Integrate user sharding hints and HLO sharding annotations across MLIR dialects and ZML core, and remove the now‑unused module options arguments.
2023-03-21 10:50:39 +00:00
8746a5ce78
Expose zml/test_runner.zig publicly to enable users to employ the async test runner. Made the dependency on zml explicit and suggest treating test_runner as a zig_library rather than a filegroup.
2023-03-16 13:22:35 +00:00
7ef67eea27
zml: Relocate tests next to the functions they verify and remove obsolete dynamicSlice1d test.
2023-03-08 14:10:11 +00:00
dfa71018a5
zml: Remove pjrtx wrapper, migrate remaining helpers to their native modules, and fix blocking issue in Event.await.
2023-03-06 17:05:56 +00:00
ecf52ad724
zml.tokenizer: Implement proper byte fallback support by converting hex byte strings (e.g., “<0x40>”) to their characters and splitting unknown UTF‑8 codepoints into bytes, fixing tokenization.
2023-02-28 14:40:25 +00:00
2f129f76c9
Add in-process sharding support across core ZML components (platform, shape, tensor, MLIR generation, buffers, and PJRT integration)
2023-02-24 17:33:14 +00:00
639f5cd994
Replace log with select for generating the attention mask to avoid NaNs on zero values.
2023-02-16 10:36:23 +00:00
24a7c98476
Implement scatterSlices functionality.
2023-02-14 13:52:49 +00:00
934acb35a8
zml: initialize Tensor.min and Tensor.max reductions with proper extreme values to ensure correct results
2023-02-10 12:28:41 +00:00
be6328813d
zml: clean up dead and commented code; note that copyslice is currently broken and pending reimplementation
2023-02-08 17:13:47 +00:00
058e1415fa
zml: deprecate buggy Tensor.chunk; introduce chunkExact and chunkAllowTrailing with clarified behavior
2023-02-07 12:42:34 +00:00
0606ea1d7c
Update Bazel workspace and runtime BUILD files to newer XLA, StableHLO, and LLVM versions, enabling batching‑dims support for the gather operator.
2023-02-01 15:58:30 +00:00
897786e440
aio: correct refAllDecls handling for yaml and nemo modules
2023-01-31 11:58:58 +00:00
7dcd8b516c
zml/nn: fix resize implementations (resizeBilinear and resizeBicubic) and expand refAllDecl usage; all tests pass
2023-01-27 14:35:11 +00:00