6aa9aa5a7b
Add preliminary implementation for custom call support.
2024-12-10 09:36:37 +00:00
f5ab2c3a55
zml: eliminate compile-time fields from Bufferized, removing the need to pass undefined to exe.call for inlined arguments. Introduce BufferizedWithArgs in zml.testing for compileAndCall utility.
2024-11-28 12:24:39 +00:00
95453c7242
Update XLA dependency to version 20250527.0‑cb67f2f and refresh related Bazel BUILD, MODULE, overlay and patch files.
2024-11-22 16:50:20 +00:00
3849eb10b7
Add buffer and hostbuffer utilities with precise f32→bf16 conversion, type inference for loadBuffers, store expected input shapes, enhance meta.visit and JSON TaggedUnion support, and improve logging.
2024-10-28 11:21:46 +00:00
4ef81b89ea
stdx.fmt: add slice formatting support, improving on previous prettyPrinter implementation by leveraging internal fmt mechanisms.
2024-10-18 15:05:08 +00:00
aacbf2ee04
Fix Llama3 rope scaling implementation in the neural network module (zml/nn.zig)
2024-10-07 12:53:03 +00:00
2863c1f5e0
zml/tensor: fix returned value in Tensor.toMemory – ensure _output_memory_kind is set correctly in the result.
2024-09-18 13:18:08 +00:00
aec7072837
pjrt: add FFI bindings for custom calls
2024-09-10 09:14:28 +00:00
1f5ff96c10
zml/ops: add wiring for operand output alias in zml.ops.triton
2024-09-09 15:00:28 +00:00
4b7e618b43
zml/aio: add bool handling in struct population within populateStruct
2024-09-02 14:11:47 +00:00
ac63c30e12
add mini-DSL for creating MLIR common attributes and types, leveraging Zig 0.14 to simplify mlir.Type and mlir.Attribute creation
2024-08-26 14:19:00 +00:00
63ef78efcc
zml: add support for NVTX tracing
2024-08-21 14:41:40 +00:00
7df89301dc
Bump XLA version and import llvm, stablehlo, triton, and zig‑protobuf modules in workspace BUILD files.
2024-08-06 10:28:43 +00:00
3f36506f1c
zml: remove usingnamespace from floats.zig and related dependencies; note that incremental compilation does not improve overall build time due to linking overhead
2024-07-23 17:43:43 +00:00
42dee5d0e0
mlir: rework stablehlo custom call implementation and add a Triton example
2024-07-16 13:23:07 +00:00
aec1d96e6d
mlir: rework DenseElementsAttribute to correctly slice inputs and modify .as() to return a concrete value instead of an optional
2024-07-15 12:32:24 +00:00
30f6be0e2f
Update core Zig modules (async, mlir, pjrt, stdx) and third‑party Bazel definitions for the Zig 0.14.0 release.
2024-07-02 14:19:04 +00:00
18eb0e5a7b
Add async I/O, SentencePiece, NN, and tensor utilities for ModernBERT support and update Bazel build configuration.
2024-06-14 15:27:06 +00:00
221ece647d
zml/ops.zig: Added zml.ops.case operation
...
This can be used to select which branch will be run at runtime.
It wraps the `stablehlo.case` operation.
2024-05-30 14:11:08 +00:00
3aac788544
Update Bazel build configurations (zig.bzl, BUILD files) for MLIR, PJRT, Neuron, ROCm, tokenizer, and tools, fixing broken dependencies.
2024-05-20 11:28:25 +00:00
05944b5cc9
Update FnCache to copy and reuse non‑tensor fields in fixed‑size structs, preventing undefined memory in core modules.
2024-05-15 17:54:52 +00:00
a34190679b
Fix llama token handling and remove redundant prompt token reuse in core Zig modules (aio, module, nn, pjrtx, tensor)
2024-05-02 17:10:11 +00:00
13eff4e661
pjrt,zml: add memory bindings
...
This preliminary PR binds PJRT memory endpoints and adds them to
`zml.Buffer`.
A follow up PR will properly integrate it inside `zml.Buffer`
2024-04-11 15:43:24 +00:00
d4db5ccc6b
Integrate TinyLlama support, restore the homemade tokenizer, and align Zig API naming across stdx and zml tokenizer modules.
2024-04-05 15:07:29 +00:00
8a25b1eb74
Revert CUDA PJRT plugin version to 0.4.38 to address performance regression on XLA master.
2024-03-05 17:04:42 +00:00
959bc48c42
Add HuggingFace tokenizer bindings and SentencePiece integration; update BUILD files, async utilities, and FFI modules to support the new tokenizers.
2024-02-28 15:47:37 +00:00
c109b12e1b
Various minor fixes: rewrite tinyllama tokenizer newline token, prevent HostBuffer.isContiguous false trigger on 1‑dim axes, improve HostBuffer.slice1d error messages, simplify module.zig output to show .mlir file path, correct setFlags handling of comptime int/float, make tokenizer.zig return <oob> for out‑of‑range detokenization, and speed up Buffer.constant creation up to 2.5 GB/s on CUDA.
2024-02-19 12:34:18 +00:00
169a24307c
Migrate workspace and XLA module definitions to Bazel 8, updating MODULE.bazel files, BUILD rules, and related migration patches.
2024-02-12 12:43:23 +00:00
7e6103d876
Upgrade XLA to version 20250122.0-cc075be, switch to nvptx compiler and nvlink with nvjitlink support, add warning for CUDA path in LD_LIBRARY_PATH, and revert the previous CUDA sandbox fix.
2024-02-06 09:31:48 +00:00
b8a0aaee5a
Update tokenizer to handle byte_fallback for Llama3 GPT2 vocab and add a Llama3‑specific normalizer; adjust tinyllama.zig and hostbuffer.zig to use the new tokenization logic.
2024-02-05 15:22:44 +00:00
a7b7ae0180
Fix async hangs by reworking the libxev epoll backend and using callBlocking for PJRT plugin loading, improving performance across async and runtime modules.
2024-01-16 14:13:45 +00:00
434cee3a6c
Fix CUDA and ROCm sandbox discovery, update epoll libxev patch to prevent high CPU usage, enable XLA GPU latency‑hiding scheduler, and upgrade cuDNN to 9.6.0.
2024-01-15 09:41:42 +00:00
68dbc290e9
zml: revamp scatterSlices
...
Main issue with current `scatter` implementation is that it uses
broadcasting dims of `stablehlo.scatter`.
While nice in theory, the optimizer doesn't handle them well and they
often are unrolled into while loop.
Here I convert the batching dim to extra iotas indices.
2024-01-08 17:55:20 +00:00
83b5e1ec48
fix
...
Before we where using `module.op().writeBytecode(writer)` to compute the
hash of a model
but it crashes on some inputs, notably for unused variables.
So I used the text representation of the mlir.
2024-01-05 16:44:41 +00:00
acc492454f
Add operator name to source locations and introduce QoL enhancements: remove bias from sdpa, support shape literals in gatherSlices, add Shape.outer, Tensor.all, and infer argMax dtype.
2024-01-01 15:31:41 +00:00
5bd7f8aae9
zml: HostBuffer.prettyPrint()
...
Add pretty printing of HostBuffer.
This will be leverage by the debug helper `x.print()`
It can also be used like this: `std.log.info("my buffer: {}",
.{host_buffer.pretty()})`
2023-12-25 13:01:17 +00:00
7ef87236ce
Rewrite simple transpose as reshape in core ZML modules and raise default profiler event limit to 1,000,000.
2023-12-18 13:56:45 +00:00
145e60b4dd
workspace: Update LLVM, XLA, StableHLO, and PJRT plugins to latest versions.
2023-12-13 10:10:32 +00:00
6a4a7fb9a1
zml/module.zig: Remove unnecessary optional unwrapping.
2023-12-05 12:27:08 +00:00
37725cdaa6
Update PJRT, runtime, and ZML modules to use per‑target output folders and expose profiler.dumpDataAsJson for JSON profiling output.
2023-12-04 10:38:10 +00:00
6e4fef8844
zml: Introduce arena allocator in CompilationContext. Expose arena allocator to replace existing allocator, enabling safe allocation for ops without misusing std.BoundedArray. Includes breaking changes to chunkAllowTrailing and split. Upgrade axis_ types to anytype for tag handling and add TODOs for upcoming Tensor API.
2023-11-16 15:11:23 +00:00
57bf667c90
Add struct‑based client creation flags to the Zig PJRT API and update context.autoPlatform to accept a flag struct.
2023-11-13 12:45:17 +00:00
9f4194ad97
Fix test layer. Add tests to detect silent breakage of testLayer and regression in mapAlloc with zero-size struct fields. Add Python venv directory to .gitignore.
2023-11-06 11:25:57 +00:00
98b512c495
Implement func.call emission and function caching across MLIR dialects and ZML module/ops, propagating tags and donations.
2023-10-19 17:01:55 +00:00
7d36913b31
Refactor ZML API: move compile, compileFn and related types to exe.zig, update BaseExe allocation and inline caching in compileInternal, and clean up supporting modules (func.zig, meta.zig, signature.zig, cuda.zig, testing.zig, zml.zig).
2023-10-13 16:08:08 +00:00
3bc6ad98be
Update module.zig to donate all buffers except the token_index buffer for the Llama+Neuron example.
2023-10-06 10:10:56 +00:00
5122ca0203
Refactor rope implementation to compute only required offsets, eliminating full cos/sin matrix generation in module, nn, and tensor code.
2023-09-27 11:45:33 +00:00
b5c4fb7c58
zml: fix float8 <-> float32 conversions, support for Tensor.constant(.{}, .{ .f8 = 1.0})
...
Mostly:
* fix float8 <-> float32 conversions
* support for `Tensor.constant(.{}, .{ .f8 = 1.0})`
Misc:
* fix small inconsistencies between different versions of sdpa
* better error message for broadcast
* bazelrc: --config=debug
2023-09-21 11:15:50 +00:00
0d5389ceda
Update CUDA runtime sandboxing and dynamic symbol renaming, switch to pre‑built jax‑cuda‑pjrt plugin, and bump CUDA to 12.6.2 and cuDNN to 9.5.1.
2023-09-14 13:28:25 +00:00
c8c99d7d5a
zml/pjrtx: prefer the built‑in stablehlo version when a plugin reports a newer version, ensuring artifact serialization uses the correct stablehlo version.
2023-09-07 17:06:19 +00:00