Commit Graph

47 Commits

Author SHA1 Message Date
e7323be10b runtimes/rocm: switch to in-process LLD, removing the need for sandboxed lld. 2025-04-23 11:43:18 +00:00
0a2ab7c8cb Remove usingnamespace from MLIR. 2025-01-28 09:35:58 +00:00
f5ab2c3a55 zml: eliminate compile-time fields from Bufferized, removing the need to pass undefined to exe.call for inlined arguments. Introduce BufferizedWithArgs in zml.testing for compileAndCall utility. 2024-11-28 12:24:39 +00:00
3849eb10b7 Add buffer and hostbuffer utilities with precise f32→bf16 conversion, type inference for loadBuffers, store expected input shapes, enhance meta.visit and JSON TaggedUnion support, and improve logging. 2024-10-28 11:21:46 +00:00
aec7072837 pjrt: add FFI bindings for custom calls 2024-09-10 09:14:28 +00:00
ac63c30e12 add mini-DSL for creating MLIR common attributes and types, leveraging Zig 0.14 to simplify mlir.Type and mlir.Attribute creation 2024-08-26 14:19:00 +00:00
aec1d96e6d mlir: rework DenseElementsAttribute to correctly slice inputs and modify .as() to return a concrete value instead of an optional 2024-07-15 12:32:24 +00:00
30f6be0e2f Update core Zig modules (async, mlir, pjrt, stdx) and third‑party Bazel definitions for the Zig 0.14.0 release. 2024-07-02 14:19:04 +00:00
05944b5cc9 Update FnCache to copy and reuse non‑tensor fields in fixed‑size structs, preventing undefined memory in core modules. 2024-05-15 17:54:52 +00:00
a34190679b Fix llama token handling and remove redundant prompt token reuse in core Zig modules (aio, module, nn, pjrtx, tensor) 2024-05-02 17:10:11 +00:00
8a25b1eb74 Revert CUDA PJRT plugin version to 0.4.38 to address performance regression on XLA master. 2024-03-05 17:04:42 +00:00
c109b12e1b Various minor fixes: rewrite tinyllama tokenizer newline token, prevent HostBuffer.isContiguous false trigger on 1‑dim axes, improve HostBuffer.slice1d error messages, simplify module.zig output to show .mlir file path, correct setFlags handling of comptime int/float, make tokenizer.zig return <oob> for out‑of‑range detokenization, and speed up Buffer.constant creation up to 2.5 GB/s on CUDA. 2024-02-19 12:34:18 +00:00
7e6103d876 Upgrade XLA to version 20250122.0-cc075be, switch to nvptx compiler and nvlink with nvjitlink support, add warning for CUDA path in LD_LIBRARY_PATH, and revert the previous CUDA sandbox fix. 2024-02-06 09:31:48 +00:00
434cee3a6c Fix CUDA and ROCm sandbox discovery, update epoll libxev patch to prevent high CPU usage, enable XLA GPU latency‑hiding scheduler, and upgrade cuDNN to 9.6.0. 2024-01-15 09:41:42 +00:00
68dbc290e9 zml: revamp scatterSlices
Main issue with current `scatter` implementation is that it uses
broadcasting dims of `stablehlo.scatter`.
While nice in theory, the optimizer doesn't handle them well and they
often are unrolled into while loop.
Here I convert the batching dim to extra iotas indices.
2024-01-08 17:55:20 +00:00
83b5e1ec48 fix
Before we where using `module.op().writeBytecode(writer)` to compute the
hash of a model
but it crashes on some inputs, notably for unused variables.

So I used the text representation of the mlir.
2024-01-05 16:44:41 +00:00
5bd7f8aae9 zml: HostBuffer.prettyPrint()
Add pretty printing of HostBuffer.

This will be leverage by the debug helper `x.print()`
It can also be used like this: `std.log.info("my buffer: {}",
.{host_buffer.pretty()})`
2023-12-25 13:01:17 +00:00
7ef87236ce Rewrite simple transpose as reshape in core ZML modules and raise default profiler event limit to 1,000,000. 2023-12-18 13:56:45 +00:00
6a4a7fb9a1 zml/module.zig: Remove unnecessary optional unwrapping. 2023-12-05 12:27:08 +00:00
37725cdaa6 Update PJRT, runtime, and ZML modules to use per‑target output folders and expose profiler.dumpDataAsJson for JSON profiling output. 2023-12-04 10:38:10 +00:00
6e4fef8844 zml: Introduce arena allocator in CompilationContext. Expose arena allocator to replace existing allocator, enabling safe allocation for ops without misusing std.BoundedArray. Includes breaking changes to chunkAllowTrailing and split. Upgrade axis_ types to anytype for tag handling and add TODOs for upcoming Tensor API. 2023-11-16 15:11:23 +00:00
98b512c495 Implement func.call emission and function caching across MLIR dialects and ZML module/ops, propagating tags and donations. 2023-10-19 17:01:55 +00:00
7d36913b31 Refactor ZML API: move compile, compileFn and related types to exe.zig, update BaseExe allocation and inline caching in compileInternal, and clean up supporting modules (func.zig, meta.zig, signature.zig, cuda.zig, testing.zig, zml.zig). 2023-10-13 16:08:08 +00:00
3bc6ad98be Update module.zig to donate all buffers except the token_index buffer for the Llama+Neuron example. 2023-10-06 10:10:56 +00:00
5122ca0203 Refactor rope implementation to compute only required offsets, eliminating full cos/sin matrix generation in module, nn, and tensor code. 2023-09-27 11:45:33 +00:00
7d24329d0a Add Bazel build rules and runtime implementation for AWS Neuron/Trainium/Inferentia support. 2023-08-18 17:11:27 +00:00
0709b1b32f zml: reduce memory usage of sdpaMemEfficient by using zml.ops.while instead of zml.ops.for, avoiding concatenation of intermediate results. 2023-08-14 14:24:11 +00:00
01eff33fa0 Update workspace dependencies to newer LLVM, XLA, StableHLO, and PJRT versions and expose new pjrt plugin attribute APIs and stablehlo version APIs in build and runtime configurations. 2023-08-07 12:28:36 +00:00
b53462b515 Fix crash in for_ by ensuring values are pushed to their block before opening a new block, adding asserts for block state, and guaranteeing first_step is used. Adjust padding syntax to improve usability. 2023-07-25 14:25:47 +00:00
f675a203c2 zml.ops.makeBlock now returns the inner tensor to propagate tags. The function returns both the created mlir.Block and tensors from the supplied function, allowing shape and tag propagation without exposing mlir.Values. Updated tests to run on non‑CPU platforms. 2023-07-21 09:01:01 +00:00
0f9a92f27d module-cache: raise max_pjrt_executable_size limit to 400 MB to accommodate large PJRT executables. 2023-07-14 17:58:22 +00:00
63aca9f9c2 Hotfixes for build rule, math utilities, module system, and NN implementation (fixes,) 2023-06-29 10:26:54 +00:00
9b7eea8ac2 Add stdx utilities and rework async signature inference; tidy executable logging. 2023-06-21 14:45:14 +00:00
52ef20f981 zml: reintroduce pjrtx to handle reactor blocking issues in async scenarios, particularly with Events. 2023-05-26 15:54:15 +00:00
c68ec4bc5c async: implement default threaded backend using a thread pool. Backend selectable via @zml//async:impl flag (threaded or zigcoro). Provides workaround for environments where io_uring is unavailable. 2023-05-25 16:02:11 +00:00
57130577e9 Add fallback for runtimes lacking PJRT_Event by using thread‑pool dispatch for buffer copies and treating operations as synchronous when events are absent. 2023-05-09 12:44:56 +00:00
5543c8192f Rename async_ to asyncc and add Generic async slugs in async.zig, aio.zig, and module.zig. 2023-05-04 14:44:12 +00:00
8e43a45a3c Add event waiting when invoking a module and improve multi‑device sharding handling. 2023-04-11 11:32:09 +00:00
66881899ca Fix testLayer by removing unnecessary compile_options argument and updating testing logic for new sharded output, ensuring proper usage by llama.zig. 2023-03-31 14:23:45 +00:00
a4f0fc96c0 Integrate user sharding hints and HLO sharding annotations across MLIR dialects and ZML core, and remove the now‑unused module options arguments. 2023-03-21 10:50:39 +00:00
dfa71018a5 zml: Remove pjrtx wrapper, migrate remaining helpers to their native modules, and fix blocking issue in Event.await. 2023-03-06 17:05:56 +00:00
2f129f76c9 Add in-process sharding support across core ZML components (platform, shape, tensor, MLIR generation, buffers, and PJRT integration) 2023-02-24 17:33:14 +00:00
24a7c98476 Implement scatterSlices functionality. 2023-02-14 13:52:49 +00:00
be6328813d zml: clean up dead and commented code; note that copyslice is currently broken and pending reimplementation 2023-02-08 17:13:47 +00:00
7dcd8b516c zml/nn: fix resize implementations (resizeBilinear and resizeBicubic) and expand refAllDecl usage; all tests pass 2023-01-27 14:35:11 +00:00
ebdb8db213 zml/tests: re‑enable all Zig tests, fix precision issue by switching to f32, and add refAllDecls to ensure all declarations are tested 2023-01-23 16:28:19 +00:00
266da6d4be Add initial Bazel build configuration, async runtime implementation, and core MLIR dialect definitions for ZML. 2023-01-02 14:28:25 +00:00