Radix

Author	SHA1	Message	Date
Tarry Singh	434cee3a6c	Fix CUDA and ROCm sandbox discovery, update epoll libxev patch to prevent high CPU usage, enable XLA GPU latency‑hiding scheduler, and upgrade cuDNN to 9.6.0.	2024-01-15 09:41:42 +00:00
Tarry Singh	68dbc290e9	zml: revamp scatterSlices Main issue with current `scatter` implementation is that it uses broadcasting dims of `stablehlo.scatter`. While nice in theory, the optimizer doesn't handle them well and they often are unrolled into while loop. Here I convert the batching dim to extra iotas indices.	2024-01-08 17:55:20 +00:00
Tarry Singh	83b5e1ec48	fix Before we where using `module.op().writeBytecode(writer)` to compute the hash of a model but it crashes on some inputs, notably for unused variables. So I used the text representation of the mlir.	2024-01-05 16:44:41 +00:00
Tarry Singh	acc492454f	Add operator name to source locations and introduce QoL enhancements: remove bias from sdpa, support shape literals in gatherSlices, add Shape.outer, Tensor.all, and infer argMax dtype.	2024-01-01 15:31:41 +00:00
Tarry Singh	5bd7f8aae9	zml: HostBuffer.prettyPrint() Add pretty printing of HostBuffer. This will be leverage by the debug helper `x.print()` It can also be used like this: `std.log.info("my buffer: {}", .{host_buffer.pretty()})`	2023-12-25 13:01:17 +00:00
Tarry Singh	7ef87236ce	Rewrite simple transpose as reshape in core ZML modules and raise default profiler event limit to 1,000,000.	2023-12-18 13:56:45 +00:00
Tarry Singh	145e60b4dd	workspace: Update LLVM, XLA, StableHLO, and PJRT plugins to latest versions.	2023-12-13 10:10:32 +00:00
Tarry Singh	6a4a7fb9a1	zml/module.zig: Remove unnecessary optional unwrapping.	2023-12-05 12:27:08 +00:00
Tarry Singh	37725cdaa6	Update PJRT, runtime, and ZML modules to use per‑target output folders and expose `profiler.dumpDataAsJson` for JSON profiling output.	2023-12-04 10:38:10 +00:00
Tarry Singh	6e4fef8844	zml: Introduce arena allocator in CompilationContext. Expose arena allocator to replace existing allocator, enabling safe allocation for ops without misusing std.BoundedArray. Includes breaking changes to chunkAllowTrailing and split. Upgrade axis_ types to anytype for tag handling and add TODOs for upcoming Tensor API.	2023-11-16 15:11:23 +00:00
Tarry Singh	57bf667c90	Add struct‑based client creation flags to the Zig PJRT API and update `context.autoPlatform` to accept a flag struct.	2023-11-13 12:45:17 +00:00
Tarry Singh	9f4194ad97	Fix test layer. Add tests to detect silent breakage of testLayer and regression in mapAlloc with zero-size struct fields. Add Python venv directory to .gitignore.	2023-11-06 11:25:57 +00:00
Tarry Singh	98b512c495	Implement func.call emission and function caching across MLIR dialects and ZML module/ops, propagating tags and donations.	2023-10-19 17:01:55 +00:00
Tarry Singh	7d36913b31	Refactor ZML API: move compile, compileFn and related types to `exe.zig`, update `BaseExe` allocation and inline caching in `compileInternal`, and clean up supporting modules (`func.zig`, `meta.zig`, `signature.zig`, `cuda.zig`, `testing.zig`, `zml.zig`).	2023-10-13 16:08:08 +00:00
Tarry Singh	3bc6ad98be	Update module.zig to donate all buffers except the `token_index` buffer for the Llama+Neuron example.	2023-10-06 10:10:56 +00:00
Tarry Singh	5122ca0203	Refactor rope implementation to compute only required offsets, eliminating full cos/sin matrix generation in module, nn, and tensor code.	2023-09-27 11:45:33 +00:00
Tarry Singh	b5c4fb7c58	zml: fix float8 <-> float32 conversions, support for `Tensor.constant(.{}, .{ .f8 = 1.0})` Mostly: * fix float8 <-> float32 conversions * support for `Tensor.constant(.{}, .{ .f8 = 1.0})` Misc: * fix small inconsistencies between different versions of sdpa * better error message for broadcast * bazelrc: --config=debug	2023-09-21 11:15:50 +00:00
Tarry Singh	0d5389ceda	Update CUDA runtime sandboxing and dynamic symbol renaming, switch to pre‑built jax‑cuda‑pjrt plugin, and bump CUDA to 12.6.2 and cuDNN to 9.5.1.	2023-09-14 13:28:25 +00:00
Tarry Singh	c8c99d7d5a	zml/pjrtx: prefer the built‑in stablehlo version when a plugin reports a newer version, ensuring artifact serialization uses the correct stablehlo version.	2023-09-07 17:06:19 +00:00
Tarry Singh	9505992e00	workspace: log diagnostic message before returning NotFound to aid debugging.	2023-09-04 13:34:37 +00:00
Tarry Singh	aa7fae449e	zml/pjrtx: execute `bufferFromHostBuffer` on the thread pool to avoid blocking and improve weight loading performance.	2023-08-29 10:28:51 +00:00
Tarry Singh	c081cb9ad6	zml/platform: increase maximum device limit to support up to 32 devices per platform.	2023-08-24 12:23:07 +00:00
Tarry Singh	7d24329d0a	Add Bazel build rules and runtime implementation for AWS Neuron/Trainium/Inferentia support.	2023-08-18 17:11:27 +00:00
Tarry Singh	0709b1b32f	zml: reduce memory usage of sdpaMemEfficient by using zml.ops.while instead of zml.ops.for, avoiding concatenation of intermediate results.	2023-08-14 14:24:11 +00:00
Tarry Singh	01eff33fa0	Update workspace dependencies to newer LLVM, XLA, StableHLO, and PJRT versions and expose new pjrt plugin attribute APIs and stablehlo version APIs in build and runtime configurations.	2023-08-07 12:28:36 +00:00
Tarry Singh	bcde3962ce	Rework async runtime with coroutine support, rename async API (async_→asyncc, await_→awaitt), improve type inference, bump libxev (default epoll) and update related stdx and zml modules.	2023-08-01 11:35:04 +00:00
Tarry Singh	b53462b515	Fix crash in for_ by ensuring values are pushed to their block before opening a new block, adding asserts for block state, and guaranteeing first_step is used. Adjust padding syntax to improve usability.	2023-07-25 14:25:47 +00:00
Tarry Singh	f675a203c2	zml.ops.makeBlock now returns the inner tensor to propagate tags. The function returns both the created mlir.Block and tensors from the supplied function, allowing shape and tag propagation without exposing mlir.Values. Updated tests to run on non‑CPU platforms.	2023-07-21 09:01:01 +00:00
Tarry Singh	be8aa4fa8e	Fix several compileError calls introduced by recent changes; ensure Zig compiler catches errors at comptime.	2023-07-17 09:10:27 +00:00
Tarry Singh	0f9a92f27d	module-cache: raise max_pjrt_executable_size limit to 400 MB to accommodate large PJRT executables.	2023-07-14 17:58:22 +00:00
Tarry Singh	63aca9f9c2	Hotfixes for build rule, math utilities, module system, and NN implementation (fixes,)	2023-06-29 10:26:54 +00:00
Tarry Singh	9b7eea8ac2	Add stdx utilities and rework async signature inference; tidy executable logging.	2023-06-21 14:45:14 +00:00
Tarry Singh	c30aa018dc	zml: small cleanup - Add more scatterSlices test cases. - Replace helpers.mapTensors with zml.meta.map. - Fix shape handling when a for loop is fully unrolled. - Allow zml.Tensor.pad to accept i64 for dimension compatibility. - Enable arrays of tensors inside model structs. - Split Buffer.asViewOf into asViewOfHostBuffer and asViewOfDeviceBuffer.	2023-06-19 15:29:29 +00:00
Tarry Singh	f00538667e	zml.nn: add dynamic sampling with support for top‑k, top‑p, and min‑p settings. Implements token index computation based on the selected sampling strategy, including options for top_k, max_top_k, top_p, and min_p.	2023-06-16 14:34:18 +00:00
Tarry Singh	b244a18621	zml: set iota default dtype to .i32, with fallback to .i64 for axes with many elements, simplifying usage.	2023-06-15 12:45:52 +00:00
Tarry Singh	344e07fb6e	stablehlo: extend dot_general API to include DotAlgorithm support by merging precision and algorithm attributes into a union, aligning with spec requirements. Currently not exposed to users due to limited algorithm support.	2023-06-07 11:20:25 +00:00
Tarry Singh	6d720126ac	Add PJRT custom call integration with generic zmlHostBufferCallback to copy tensors to host and invoke user callbacks. Introduce Tensor.print() method to output runtime tensor values (CUDA‑specific, uses a pre‑allocated host buffer).	2023-06-05 13:42:45 +00:00
Tarry Singh	499b0d20e5	pjrtx: change behavior to return an error when OpenXLA fails to serialize the new batching_dim attribute for gather/scatter, instead of panicking.	2023-05-29 17:18:19 +00:00
Tarry Singh	52ef20f981	zml: reintroduce pjrtx to handle reactor blocking issues in async scenarios, particularly with Events.	2023-05-26 15:54:15 +00:00
Tarry Singh	c68ec4bc5c	async: implement default threaded backend using a thread pool. Backend selectable via @zml//async:impl flag (threaded or zigcoro). Provides workaround for environments where io_uring is unavailable.	2023-05-25 16:02:11 +00:00
Tarry Singh	89cf2233d3	zml/aio: enable reading metadata from index.json for sharded safetensor files, allowing metadata storage alongside model config.	2023-05-23 15:06:59 +00:00
Tarry Singh	2f54e2a5f3	zml.tensor: add triangular operator to zero out the upper‑right matrix region with configurable offset, and toDiagonal (diag_embed) to embed a vector as a diagonal matrix, correcting previous diag naming. Also add ELU activation under zml.nn.Activation.	2023-05-18 16:39:21 +00:00
Tarry Singh	05faa5021e	zml.tensor: add cumulativeSum operator and refactor maxPoolND. Introduce cumulative sum using reduceWindow. Simplify reduceWindow signature by merging padding_shape and padding_value. Update maxPool1D/2D to accept tuple arguments. Revise pad to use tagged or AOS syntax; remove SOA syntax.	2023-05-17 09:01:27 +00:00
Tarry Singh	54e7eb30b4	Introduce a thin abstraction layer between ZML and PJRT to manage plugin loading decisions, enable compile‑time detection of linked runtimes, and handle cases such as libtpu blocking metadata access.	2023-05-15 09:36:41 +00:00
Tarry Singh	74e90855ca	Configure the runfiles environment globally at context start to ensure Bazel-built binaries locate their runfiles correctly.	2023-05-12 11:40:23 +00:00
Tarry Singh	57130577e9	Add fallback for runtimes lacking PJRT_Event by using thread‑pool dispatch for buffer copies and treating operations as synchronous when events are absent.	2023-05-09 12:44:56 +00:00
Tarry Singh	5543c8192f	Rename async_ to asyncc and add Generic async slugs in async.zig, aio.zig, and module.zig.	2023-05-04 14:44:12 +00:00
Tarry Singh	fefd84b1bb	Replace silu implementation with stablehlo.logistic for higher precision, move logistic logic into sigmoid and alias logistic to sigmoid (breaking change).	2023-05-01 10:40:50 +00:00
Tarry Singh	021111d07d	Extend tests to handle all float types, preventing crashes with bfloat16 tensors.	2023-04-27 10:34:27 +00:00
Tarry Singh	ed6444b775	Add Tensor.concatenate support, begin deprecating broadcastLeft, and compute transformer head scaling constant in f32 for higher precision.	2023-04-21 15:55:07 +00:00
Tarry Singh	11006ca08d	Refactor torch module: merge PickleData into Parser as torch.File, rename value file to py_object.zig, use buffered reader for pickle and zip headers, adjust intermediate result handling, simplify Python dict representation, separate kwargs from args, and add extensive tests for long integers, protocol 0, zipped pickle, and a complex PyTorch Conv2d case; also streamline BufferStore initialization.	2023-04-20 15:43:18 +00:00
Tarry Singh	8e43a45a3c	Add event waiting when invoking a module and improve multi‑device sharding handling.	2023-04-11 11:32:09 +00:00
Tarry Singh	0189b71070	Rename `zml.aio.Value` to `zml.aio.Metadata`, simplify its type variants, and update torch pickle/eval APIs accordingly.	2023-04-07 16:45:58 +00:00
Tarry Singh	e25f70d923	Rename and simplify modules in `zml/aio/torch`: replace redundant qualified names, remove generic utilities, inline code, reorder functions for top‑to‑bottom readability, and extract parsing logic into `parseTensor` and `parseStorage` functions.	2023-04-04 17:20:53 +00:00
Tarry Singh	66881899ca	Fix `testLayer` by removing unnecessary `compile_options` argument and updating testing logic for new sharded output, ensuring proper usage by `llama.zig`.	2023-03-31 14:23:45 +00:00
Tarry Singh	05d23beb23	Add `Normalizer.fromHfJson` to read HuggingFace tokenizer JSON and map to internal options, including a configurable magic space token and a debug flag for token merges. Adjust default handling of extra whitespaces to align with HF defaults.	2023-03-29 16:10:29 +00:00
Tarry Singh	ef922e3aea	Fix empty JSON array handling in safetensor metadata loader and refactor torch loader (make ops slices const and improve readability).	2023-03-28 16:17:00 +00:00
Tarry Singh	a4f0fc96c0	Integrate user sharding hints and HLO sharding annotations across MLIR dialects and ZML core, and remove the now‑unused module options arguments.	2023-03-21 10:50:39 +00:00
Tarry Singh	8746a5ce78	Expose `zml/test_runner.zig` publicly to enable users to employ the async test runner. Made the dependency on `zml` explicit and suggest treating `test_runner` as a `zig_library` rather than a filegroup.	2023-03-16 13:22:35 +00:00
Tarry Singh	7ef67eea27	zml: Relocate tests next to the functions they verify and remove obsolete dynamicSlice1d test.	2023-03-08 14:10:11 +00:00
Tarry Singh	dfa71018a5	zml: Remove pjrtx wrapper, migrate remaining helpers to their native modules, and fix blocking issue in Event.await.	2023-03-06 17:05:56 +00:00
Tarry Singh	ecf52ad724	zml.tokenizer: Implement proper byte fallback support by converting hex byte strings (e.g., “<0x40>”) to their characters and splitting unknown UTF‑8 codepoints into bytes, fixing tokenization.	2023-02-28 14:40:25 +00:00
Tarry Singh	2f129f76c9	Add in-process sharding support across core ZML components (platform, shape, tensor, MLIR generation, buffers, and PJRT integration)	2023-02-24 17:33:14 +00:00
Tarry Singh	639f5cd994	Replace `log` with `select` for generating the attention mask to avoid NaNs on zero values.	2023-02-16 10:36:23 +00:00
Tarry Singh	24a7c98476	Implement scatterSlices functionality.	2023-02-14 13:52:49 +00:00
Tarry Singh	934acb35a8	zml: initialize Tensor.min and Tensor.max reductions with proper extreme values to ensure correct results	2023-02-10 12:28:41 +00:00
Tarry Singh	be6328813d	zml: clean up dead and commented code; note that copyslice is currently broken and pending reimplementation	2023-02-08 17:13:47 +00:00
Tarry Singh	058e1415fa	zml: deprecate buggy Tensor.chunk; introduce chunkExact and chunkAllowTrailing with clarified behavior	2023-02-07 12:42:34 +00:00
Tarry Singh	0606ea1d7c	Update Bazel workspace and runtime BUILD files to newer XLA, StableHLO, and LLVM versions, enabling batching‑dims support for the gather operator.	2023-02-01 15:58:30 +00:00
Tarry Singh	897786e440	aio: correct refAllDecls handling for yaml and nemo modules	2023-01-31 11:58:58 +00:00
Tarry Singh	7dcd8b516c	zml/nn: fix resize implementations (resizeBilinear and resizeBicubic) and expand refAllDecl usage; all tests pass	2023-01-27 14:35:11 +00:00
Tarry Singh	5e1688cbfd	aio: refactor PyTorch model parsing for better readability and optimize slice handling	2023-01-25 12:16:27 +00:00
Tarry Singh	ebdb8db213	zml/tests: re‑enable all Zig tests, fix precision issue by switching to f32, and add refAllDecls to ensure all declarations are tested	2023-01-23 16:28:19 +00:00
Tarry Singh	f39b16e13d	zml/test_runner: add optional filtering of test functions via command‑line argument, allowing selective execution of tests (e.g., `bazel run //zml:test -- sdpa`)	2023-01-20 13:50:36 +00:00
Tarry Singh	b961856e5f	zml/tensor: correct typo in uniform comment ('substract' → 'subtract')	2023-01-19 12:20:40 +00:00
Tarry Singh	ccdf218961	Add multi‑axis, batched `gatherValues` support to tensor, shape, nn, quantization, and torch modules.	2023-01-18 12:03:48 +00:00
Tarry Singh	266da6d4be	Add initial Bazel build configuration, async runtime implementation, and core MLIR dialect definitions for ZML.	2023-01-02 14:28:25 +00:00

1 2 3

127 Commits