Radix

Author	SHA1	Message	Date
Tarry Singh	169a24307c	Migrate workspace and XLA module definitions to Bazel 8, updating MODULE.bazel files, BUILD rules, and related migration patches.	2024-02-12 12:43:23 +00:00
Tarry Singh	7e6103d876	Upgrade XLA to version 20250122.0-cc075be, switch to nvptx compiler and nvlink with nvjitlink support, add warning for CUDA path in LD_LIBRARY_PATH, and revert the previous CUDA sandbox fix.	2024-02-06 09:31:48 +00:00
Tarry Singh	b8a0aaee5a	Update tokenizer to handle byte_fallback for Llama3 GPT2 vocab and add a Llama3‑specific normalizer; adjust tinyllama.zig and hostbuffer.zig to use the new tokenization logic.	2024-02-05 15:22:44 +00:00
Foke Singh	b643f7bc53	Add Bazel build rule and test for Llama3 tokenizer’s byte fallback and unknown token handling.	2024-02-02 10:25:48 +00:00
Tarry Singh	5120fe00dc	Update libxev epoll patch to resolve crashes and hangs in epoll and kqueue implementations.	2024-01-29 17:15:11 +00:00
Tarry Singh	edc2ac26f8	Adjust ROCm runtime sandboxing to hook only the PJRT plugin and make hipblastlt bytecodes optional.	2024-01-26 13:02:23 +00:00
Foke Singh	0ce36599da	Update example build config and Llama demo to support the new async epoll backend and zigcoro scheduler.	2024-01-22 12:17:01 +00:00
Tarry Singh	a7b7ae0180	Fix async hangs by reworking the libxev epoll backend and using callBlocking for PJRT plugin loading, improving performance across async and runtime modules.	2024-01-16 14:13:45 +00:00
Tarry Singh	434cee3a6c	Fix CUDA and ROCm sandbox discovery, update epoll libxev patch to prevent high CPU usage, enable XLA GPU latency‑hiding scheduler, and upgrade cuDNN to 9.6.0.	2024-01-15 09:41:42 +00:00
Tarry Singh	5b8e42f9a9	Vendor zigcoro and unify APIs; rework internals for stdx.meta compatibility, add Channel.try_send/try_recv methods, support dynamically sized channels with comptime capacity, and introduce PoolStackAllocator for coroutine stack allocation.	2024-01-11 15:40:15 +00:00
Tarry Singh	68dbc290e9	zml: revamp scatterSlices Main issue with current `scatter` implementation is that it uses broadcasting dims of `stablehlo.scatter`. While nice in theory, the optimizer doesn't handle them well and they often are unrolled into while loop. Here I convert the batching dim to extra iotas indices.	2024-01-08 17:55:20 +00:00
Tarry Singh	83b5e1ec48	fix Before we where using `module.op().writeBytecode(writer)` to compute the hash of a model but it crashes on some inputs, notably for unused variables. So I used the text representation of the mlir.	2024-01-05 16:44:41 +00:00
Tarry Singh	acc492454f	Add operator name to source locations and introduce QoL enhancements: remove bias from sdpa, support shape literals in gatherSlices, add Shape.outer, Tensor.all, and infer argMax dtype.	2024-01-01 15:31:41 +00:00
Foke Singh	223857251d	Update MNIST example to use new operator source locations and reflect recent API changes (sdpa bias removal, gatherSlices shape literals, Shape.outer, Tensor.all, and argMax dtype inference)	2023-12-26 10:45:52 +00:00
Tarry Singh	5bd7f8aae9	zml: HostBuffer.prettyPrint() Add pretty printing of HostBuffer. This will be leverage by the debug helper `x.print()` It can also be used like this: `std.log.info("my buffer: {}", .{host_buffer.pretty()})`	2023-12-25 13:01:17 +00:00
Tarry Singh	5ddd034d2c	pjrt: Fix profiler by allowing i64 resource IDs and reserving memory when creating array lists.	2023-12-20 17:18:02 +00:00
Tarry Singh	7ef87236ce	Rewrite simple transpose as reshape in core ZML modules and raise default profiler event limit to 1,000,000.	2023-12-18 13:56:45 +00:00
Foke Singh	8a031bd4c8	Update Llama example to use the simplified transpose implementation and increase default profiler size to 1,000,000 events.	2023-12-15 12:06:42 +00:00
Tarry Singh	145e60b4dd	workspace: Update LLVM, XLA, StableHLO, and PJRT plugins to latest versions.	2023-12-13 10:10:32 +00:00
Tarry Singh	6a4a7fb9a1	zml/module.zig: Remove unnecessary optional unwrapping.	2023-12-05 12:27:08 +00:00
Tarry Singh	37725cdaa6	Update PJRT, runtime, and ZML modules to use per‑target output folders and expose `profiler.dumpDataAsJson` for JSON profiling output.	2023-12-04 10:38:10 +00:00
Foke Singh	22a846de72	Update llama example to use per‑target output folders and call profiler.dumpDataAsJson for testing the new compilation layout.	2023-12-01 16:05:59 +00:00
Foke Singh	46fbbf43a2	Update tutorial documentation in write_first_model.md with quick fixes.	2023-11-30 12:14:33 +00:00
Foke Singh	737f7cbdee	Add example build runner scripts and config for Zig code completion.	2023-11-21 14:55:34 +00:00
Tarry Singh	ec37c8f731	Update Bazel build files and helper scripts to integrate the custom build runner for ZLS code completion.	2023-11-20 15:29:01 +00:00
Tarry Singh	6e4fef8844	zml: Introduce arena allocator in CompilationContext. Expose arena allocator to replace existing allocator, enabling safe allocation for ops without misusing std.BoundedArray. Includes breaking changes to chunkAllowTrailing and split. Upgrade axis_ types to anytype for tag handling and add TODOs for upcoming Tensor API.	2023-11-16 15:11:23 +00:00
Tarry Singh	57bf667c90	Add struct‑based client creation flags to the Zig PJRT API and update `context.autoPlatform` to accept a flag struct.	2023-11-13 12:45:17 +00:00
Foke Singh	cb6fcbbb1a	Update docs and Zig examples to demonstrate the new client creation flags API.	2023-11-09 12:31:11 +00:00
Tarry Singh	9f4194ad97	Fix test layer. Add tests to detect silent breakage of testLayer and regression in mapAlloc with zero-size struct fields. Add Python venv directory to .gitignore.	2023-11-06 11:25:57 +00:00
Foke Singh	237a877a29	zml: Add support for Llama 3.2 text-only models. Implement transpose over embed_tokens as a replacement for missing lm_head and make lm_head optional for compatibility. Add repositories and executions to Bazel and update README.	2023-11-01 10:16:48 +00:00
Foke Singh	1c9749c25e	docs: move image in concepts.md	2023-10-31 10:21:14 +00:00
Foke Singh	eb20548241	update instructions following, `prepare` doesn't alloc anymore, `ExeWithWeights` is `ModuleExe`	2023-10-26 13:56:56 +00:00
Tarry Singh	27c8309424	async: add intrusive queue all code contributed by @steeve * add intrusive queue * change the constructor of Channel with default AsyncThread executor --------- Co-authored-by: Steeve Morin <steeve@zml.ai>	2023-10-24 14:36:22 +00:00
Tarry Singh	98b512c495	Implement func.call emission and function caching across MLIR dialects and ZML module/ops, propagating tags and donations.	2023-10-19 17:01:55 +00:00
Foke Singh	37de7b9613	Add Llama example showcasing the new `func.call` emission and function caching behavior.	2023-10-17 11:00:37 +00:00
Tarry Singh	7d36913b31	Refactor ZML API: move compile, compileFn and related types to `exe.zig`, update `BaseExe` allocation and inline caching in `compileInternal`, and clean up supporting modules (`func.zig`, `meta.zig`, `signature.zig`, `cuda.zig`, `testing.zig`, `zml.zig`).	2023-10-13 16:08:08 +00:00
Foke Singh	35395c13f8	Update example programs (benchmark, llama, mnist, simple_layer) to use the new Exe API and reflect BaseExe allocation changes.	2023-10-10 11:12:34 +00:00
Tarry Singh	3bc6ad98be	Update module.zig to donate all buffers except the `token_index` buffer for the Llama+Neuron example.	2023-10-06 10:10:56 +00:00
Foke Singh	474f76cd75	Enable buffer donation in the Llama example, donating all buffers except the token_index buffer.	2023-10-03 16:32:40 +00:00
Tarry Singh	5122ca0203	Refactor rope implementation to compute only required offsets, eliminating full cos/sin matrix generation in module, nn, and tensor code.	2023-09-27 11:45:33 +00:00
Foke Singh	06865f5876	Update Llama example to use the new direct rope IR implementation.	2023-09-25 10:22:05 +00:00
Tarry Singh	b5c4fb7c58	zml: fix float8 <-> float32 conversions, support for `Tensor.constant(.{}, .{ .f8 = 1.0})` Mostly: * fix float8 <-> float32 conversions * support for `Tensor.constant(.{}, .{ .f8 = 1.0})` Misc: * fix small inconsistencies between different versions of sdpa * better error message for broadcast * bazelrc: --config=debug	2023-09-21 11:15:50 +00:00
Tarry Singh	455bb3877f	runtimes/cuda: obtain NCCL from the pip package, matching XLA behavior.	2023-09-20 17:41:44 +00:00
Tarry Singh	0d5389ceda	Update CUDA runtime sandboxing and dynamic symbol renaming, switch to pre‑built jax‑cuda‑pjrt plugin, and bump CUDA to 12.6.2 and cuDNN to 9.5.1.	2023-09-14 13:28:25 +00:00
Foke Singh	4abdd32f0d	Update llama example BUILD to use jax-cuda-pjrt plugin and bump CUDA (12.6.2) / CuDNN (9.5.1) versions.	2023-09-12 15:40:21 +00:00
Tarry Singh	c8c99d7d5a	zml/pjrtx: prefer the built‑in stablehlo version when a plugin reports a newer version, ensuring artifact serialization uses the correct stablehlo version.	2023-09-07 17:06:19 +00:00
Tarry Singh	9505992e00	workspace: log diagnostic message before returning NotFound to aid debugging.	2023-09-04 13:34:37 +00:00
Foke Singh	937cdec324	examples/loader: add missing stdx dependency.	2023-08-30 13:03:59 +00:00
Tarry Singh	aa7fae449e	zml/pjrtx: execute `bufferFromHostBuffer` on the thread pool to avoid blocking and improve weight loading performance.	2023-08-29 10:28:51 +00:00
Tarry Singh	c081cb9ad6	zml/platform: increase maximum device limit to support up to 32 devices per platform.	2023-08-24 12:23:07 +00:00

1 2 3 4

188 Commits