Radix

Author	SHA1	Message	Date
Tarry Singh	3849eb10b7	Add buffer and hostbuffer utilities with precise f32→bf16 conversion, type inference for loadBuffers, store expected input shapes, enhance meta.visit and JSON TaggedUnion support, and improve logging.	2024-10-28 11:21:46 +00:00
Tarry Singh	4ef81b89ea	stdx.fmt: add slice formatting support, improving on previous prettyPrinter implementation by leveraging internal fmt mechanisms.	2024-10-18 15:05:08 +00:00
Tarry Singh	aacbf2ee04	Fix Llama3 rope scaling implementation in the neural network module (zml/nn.zig)	2024-10-07 12:53:03 +00:00
Tarry Singh	2863c1f5e0	zml/tensor: fix returned value in Tensor.toMemory – ensure `_output_memory_kind` is set correctly in the result.	2024-09-18 13:18:08 +00:00
Tarry Singh	aec7072837	pjrt: add FFI bindings for custom calls	2024-09-10 09:14:28 +00:00
Tarry Singh	1f5ff96c10	zml/ops: add wiring for operand output alias in zml.ops.triton	2024-09-09 15:00:28 +00:00
Tarry Singh	4b7e618b43	zml/aio: add bool handling in struct population within populateStruct	2024-09-02 14:11:47 +00:00
Tarry Singh	ac63c30e12	add mini-DSL for creating MLIR common attributes and types, leveraging Zig 0.14 to simplify mlir.Type and mlir.Attribute creation	2024-08-26 14:19:00 +00:00
Tarry Singh	63ef78efcc	zml: add support for NVTX tracing	2024-08-21 14:41:40 +00:00
Tarry Singh	7df89301dc	Bump XLA version and import llvm, stablehlo, triton, and zig‑protobuf modules in workspace BUILD files.	2024-08-06 10:28:43 +00:00
Tarry Singh	3f36506f1c	zml: remove `usingnamespace` from floats.zig and related dependencies; note that incremental compilation does not improve overall build time due to linking overhead	2024-07-23 17:43:43 +00:00
Tarry Singh	42dee5d0e0	mlir: rework stablehlo custom call implementation and add a Triton example	2024-07-16 13:23:07 +00:00
Tarry Singh	aec1d96e6d	mlir: rework DenseElementsAttribute to correctly slice inputs and modify `.as()` to return a concrete value instead of an optional	2024-07-15 12:32:24 +00:00
Tarry Singh	30f6be0e2f	Update core Zig modules (async, mlir, pjrt, stdx) and third‑party Bazel definitions for the Zig 0.14.0 release.	2024-07-02 14:19:04 +00:00
Tarry Singh	18eb0e5a7b	Add async I/O, SentencePiece, NN, and tensor utilities for ModernBERT support and update Bazel build configuration.	2024-06-14 15:27:06 +00:00
Tarry Singh	221ece647d	zml/ops.zig: Added `zml.ops.case` operation This can be used to select which branch will be run at runtime. It wraps the `stablehlo.case` operation.	2024-05-30 14:11:08 +00:00
Tarry Singh	3aac788544	Update Bazel build configurations (zig.bzl, BUILD files) for MLIR, PJRT, Neuron, ROCm, tokenizer, and tools, fixing broken dependencies.	2024-05-20 11:28:25 +00:00
Tarry Singh	05944b5cc9	Update FnCache to copy and reuse non‑tensor fields in fixed‑size structs, preventing undefined memory in core modules.	2024-05-15 17:54:52 +00:00
Tarry Singh	a34190679b	Fix llama token handling and remove redundant prompt token reuse in core Zig modules (aio, module, nn, pjrtx, tensor)	2024-05-02 17:10:11 +00:00
Tarry Singh	13eff4e661	pjrt,zml: add memory bindings This preliminary PR binds PJRT memory endpoints and adds them to `zml.Buffer`. A follow up PR will properly integrate it inside `zml.Buffer`	2024-04-11 15:43:24 +00:00
Tarry Singh	d4db5ccc6b	Integrate TinyLlama support, restore the homemade tokenizer, and align Zig API naming across stdx and zml tokenizer modules.	2024-04-05 15:07:29 +00:00
Tarry Singh	8a25b1eb74	Revert CUDA PJRT plugin version to 0.4.38 to address performance regression on XLA master.	2024-03-05 17:04:42 +00:00
Tarry Singh	959bc48c42	Add HuggingFace tokenizer bindings and SentencePiece integration; update BUILD files, async utilities, and FFI modules to support the new tokenizers.	2024-02-28 15:47:37 +00:00
Tarry Singh	c109b12e1b	Various minor fixes: rewrite tinyllama tokenizer newline token, prevent HostBuffer.isContiguous false trigger on 1‑dim axes, improve HostBuffer.slice1d error messages, simplify module.zig output to show .mlir file path, correct setFlags handling of comptime int/float, make tokenizer.zig return <oob> for out‑of‑range detokenization, and speed up Buffer.constant creation up to 2.5 GB/s on CUDA.	2024-02-19 12:34:18 +00:00
Tarry Singh	169a24307c	Migrate workspace and XLA module definitions to Bazel 8, updating MODULE.bazel files, BUILD rules, and related migration patches.	2024-02-12 12:43:23 +00:00
Tarry Singh	7e6103d876	Upgrade XLA to version 20250122.0-cc075be, switch to nvptx compiler and nvlink with nvjitlink support, add warning for CUDA path in LD_LIBRARY_PATH, and revert the previous CUDA sandbox fix.	2024-02-06 09:31:48 +00:00
Tarry Singh	b8a0aaee5a	Update tokenizer to handle byte_fallback for Llama3 GPT2 vocab and add a Llama3‑specific normalizer; adjust tinyllama.zig and hostbuffer.zig to use the new tokenization logic.	2024-02-05 15:22:44 +00:00
Tarry Singh	a7b7ae0180	Fix async hangs by reworking the libxev epoll backend and using callBlocking for PJRT plugin loading, improving performance across async and runtime modules.	2024-01-16 14:13:45 +00:00
Tarry Singh	434cee3a6c	Fix CUDA and ROCm sandbox discovery, update epoll libxev patch to prevent high CPU usage, enable XLA GPU latency‑hiding scheduler, and upgrade cuDNN to 9.6.0.	2024-01-15 09:41:42 +00:00
Tarry Singh	68dbc290e9	zml: revamp scatterSlices Main issue with current `scatter` implementation is that it uses broadcasting dims of `stablehlo.scatter`. While nice in theory, the optimizer doesn't handle them well and they often are unrolled into while loop. Here I convert the batching dim to extra iotas indices.	2024-01-08 17:55:20 +00:00
Tarry Singh	83b5e1ec48	fix Before we where using `module.op().writeBytecode(writer)` to compute the hash of a model but it crashes on some inputs, notably for unused variables. So I used the text representation of the mlir.	2024-01-05 16:44:41 +00:00
Tarry Singh	acc492454f	Add operator name to source locations and introduce QoL enhancements: remove bias from sdpa, support shape literals in gatherSlices, add Shape.outer, Tensor.all, and infer argMax dtype.	2024-01-01 15:31:41 +00:00
Tarry Singh	5bd7f8aae9	zml: HostBuffer.prettyPrint() Add pretty printing of HostBuffer. This will be leverage by the debug helper `x.print()` It can also be used like this: `std.log.info("my buffer: {}", .{host_buffer.pretty()})`	2023-12-25 13:01:17 +00:00
Tarry Singh	7ef87236ce	Rewrite simple transpose as reshape in core ZML modules and raise default profiler event limit to 1,000,000.	2023-12-18 13:56:45 +00:00
Tarry Singh	145e60b4dd	workspace: Update LLVM, XLA, StableHLO, and PJRT plugins to latest versions.	2023-12-13 10:10:32 +00:00
Tarry Singh	6a4a7fb9a1	zml/module.zig: Remove unnecessary optional unwrapping.	2023-12-05 12:27:08 +00:00
Tarry Singh	37725cdaa6	Update PJRT, runtime, and ZML modules to use per‑target output folders and expose `profiler.dumpDataAsJson` for JSON profiling output.	2023-12-04 10:38:10 +00:00
Tarry Singh	6e4fef8844	zml: Introduce arena allocator in CompilationContext. Expose arena allocator to replace existing allocator, enabling safe allocation for ops without misusing std.BoundedArray. Includes breaking changes to chunkAllowTrailing and split. Upgrade axis_ types to anytype for tag handling and add TODOs for upcoming Tensor API.	2023-11-16 15:11:23 +00:00
Tarry Singh	57bf667c90	Add struct‑based client creation flags to the Zig PJRT API and update `context.autoPlatform` to accept a flag struct.	2023-11-13 12:45:17 +00:00
Tarry Singh	9f4194ad97	Fix test layer. Add tests to detect silent breakage of testLayer and regression in mapAlloc with zero-size struct fields. Add Python venv directory to .gitignore.	2023-11-06 11:25:57 +00:00
Tarry Singh	98b512c495	Implement func.call emission and function caching across MLIR dialects and ZML module/ops, propagating tags and donations.	2023-10-19 17:01:55 +00:00
Tarry Singh	7d36913b31	Refactor ZML API: move compile, compileFn and related types to `exe.zig`, update `BaseExe` allocation and inline caching in `compileInternal`, and clean up supporting modules (`func.zig`, `meta.zig`, `signature.zig`, `cuda.zig`, `testing.zig`, `zml.zig`).	2023-10-13 16:08:08 +00:00
Tarry Singh	3bc6ad98be	Update module.zig to donate all buffers except the `token_index` buffer for the Llama+Neuron example.	2023-10-06 10:10:56 +00:00
Tarry Singh	5122ca0203	Refactor rope implementation to compute only required offsets, eliminating full cos/sin matrix generation in module, nn, and tensor code.	2023-09-27 11:45:33 +00:00
Tarry Singh	b5c4fb7c58	zml: fix float8 <-> float32 conversions, support for `Tensor.constant(.{}, .{ .f8 = 1.0})` Mostly: * fix float8 <-> float32 conversions * support for `Tensor.constant(.{}, .{ .f8 = 1.0})` Misc: * fix small inconsistencies between different versions of sdpa * better error message for broadcast * bazelrc: --config=debug	2023-09-21 11:15:50 +00:00
Tarry Singh	0d5389ceda	Update CUDA runtime sandboxing and dynamic symbol renaming, switch to pre‑built jax‑cuda‑pjrt plugin, and bump CUDA to 12.6.2 and cuDNN to 9.5.1.	2023-09-14 13:28:25 +00:00
Tarry Singh	c8c99d7d5a	zml/pjrtx: prefer the built‑in stablehlo version when a plugin reports a newer version, ensuring artifact serialization uses the correct stablehlo version.	2023-09-07 17:06:19 +00:00
Tarry Singh	9505992e00	workspace: log diagnostic message before returning NotFound to aid debugging.	2023-09-04 13:34:37 +00:00
Tarry Singh	aa7fae449e	zml/pjrtx: execute `bufferFromHostBuffer` on the thread pool to avoid blocking and improve weight loading performance.	2023-08-29 10:28:51 +00:00
Tarry Singh	c081cb9ad6	zml/platform: increase maximum device limit to support up to 32 devices per platform.	2023-08-24 12:23:07 +00:00

1 2 3

105 Commits