13eff4e661
pjrt,zml: add memory bindings
...
This preliminary PR binds PJRT memory endpoints and adds them to
`zml.Buffer`.
A follow up PR will properly integrate it inside `zml.Buffer`
2024-04-11 15:43:24 +00:00
190c6978d2
llama: simplify llama3 prompt template encoding by removing redundant newline re-encoding and ensuring a trailing newline.
2024-04-10 09:36:28 +00:00
d4db5ccc6b
Integrate TinyLlama support, restore the homemade tokenizer, and align Zig API naming across stdx and zml tokenizer modules.
2024-04-05 15:07:29 +00:00
b67685b941
Add example Bazel build files and tokenizer test for tinyllama, including tigerbeetle integration and flags.
2024-04-01 17:40:18 +00:00
567210d1d7
bazel: depend on prebuilt protoc binaries to eliminate ~1300 build steps. Note: integration is currently blocked due to version constraints in rules_proto and toolchains_protoc.
2024-03-29 09:54:57 +00:00
e0c8eecb79
bazel: use OID as sha256 for Git LFS files to prevent unnecessary HuggingFace redownloads.
2024-03-28 17:52:52 +00:00
a811b2e1e3
llama: fix dimensions and data types
...
Removed unnecessary batching dimension introduced by recent changes. Converted index outputs from i32 to u32 for token indices. Ensures Llama runs on CUDA and RoCM. Tested on CUDA.
2024-03-20 13:37:19 +00:00
602757e7a9
Update examples to use the corrected logFn API.
2024-03-18 13:11:14 +00:00
754656f2f0
Replace real mutex with async Mutex for logFn, add fallback logger support outside coroutines, and fix ResetCondition handling.
2024-03-14 11:43:33 +00:00
980f1b17fb
Ensure all runtime plugins have correct SONAME values, fixing issues with prebuilt PJRT plugins.
2024-03-11 10:15:22 +00:00
8a25b1eb74
Revert CUDA PJRT plugin version to 0.4.38 to address performance regression on XLA master.
2024-03-05 17:04:42 +00:00
76e314db9b
Update Llama example docs and Bazel build files, and add tests for the new HuggingFace tokenizer integration.
2024-03-04 12:11:13 +00:00
959bc48c42
Add HuggingFace tokenizer bindings and SentencePiece integration; update BUILD files, async utilities, and FFI modules to support the new tokenizers.
2024-02-28 15:47:37 +00:00
5048e7dc89
Update example lock file for rules_distroless 0.4.2 upgrade and verify MNIST image build works.
2024-02-26 15:30:13 +00:00
b4b2490690
Upgrade rules_distroless to 0.4.2 in MODULE.bazel and refresh MODULE.bazel.lock accordingly.
2024-02-21 17:48:10 +00:00
c109b12e1b
Various minor fixes: rewrite tinyllama tokenizer newline token, prevent HostBuffer.isContiguous false trigger on 1‑dim axes, improve HostBuffer.slice1d error messages, simplify module.zig output to show .mlir file path, correct setFlags handling of comptime int/float, make tokenizer.zig return <oob> for out‑of‑range detokenization, and speed up Buffer.constant creation up to 2.5 GB/s on CUDA.
2024-02-19 12:34:18 +00:00
3970df5b48
Update getting_started tutorial and example Bazel files for Bazel 8 migration.
2024-02-14 10:44:47 +00:00
169a24307c
Migrate workspace and XLA module definitions to Bazel 8, updating MODULE.bazel files, BUILD rules, and related migration patches.
2024-02-12 12:43:23 +00:00
7e6103d876
Upgrade XLA to version 20250122.0-cc075be, switch to nvptx compiler and nvlink with nvjitlink support, add warning for CUDA path in LD_LIBRARY_PATH, and revert the previous CUDA sandbox fix.
2024-02-06 09:31:48 +00:00
b8a0aaee5a
Update tokenizer to handle byte_fallback for Llama3 GPT2 vocab and add a Llama3‑specific normalizer; adjust tinyllama.zig and hostbuffer.zig to use the new tokenization logic.
2024-02-05 15:22:44 +00:00
b643f7bc53
Add Bazel build rule and test for Llama3 tokenizer’s byte fallback and unknown token handling.
2024-02-02 10:25:48 +00:00
5120fe00dc
Update libxev epoll patch to resolve crashes and hangs in epoll and kqueue implementations.
2024-01-29 17:15:11 +00:00
edc2ac26f8
Adjust ROCm runtime sandboxing to hook only the PJRT plugin and make hipblastlt bytecodes optional.
2024-01-26 13:02:23 +00:00
0ce36599da
Update example build config and Llama demo to support the new async epoll backend and zigcoro scheduler.
2024-01-22 12:17:01 +00:00
a7b7ae0180
Fix async hangs by reworking the libxev epoll backend and using callBlocking for PJRT plugin loading, improving performance across async and runtime modules.
2024-01-16 14:13:45 +00:00
434cee3a6c
Fix CUDA and ROCm sandbox discovery, update epoll libxev patch to prevent high CPU usage, enable XLA GPU latency‑hiding scheduler, and upgrade cuDNN to 9.6.0.
2024-01-15 09:41:42 +00:00
5b8e42f9a9
Vendor zigcoro and unify APIs; rework internals for stdx.meta compatibility, add Channel.try_send/try_recv methods, support dynamically sized channels with comptime capacity, and introduce PoolStackAllocator for coroutine stack allocation.
2024-01-11 15:40:15 +00:00
68dbc290e9
zml: revamp scatterSlices
...
Main issue with current `scatter` implementation is that it uses
broadcasting dims of `stablehlo.scatter`.
While nice in theory, the optimizer doesn't handle them well and they
often are unrolled into while loop.
Here I convert the batching dim to extra iotas indices.
2024-01-08 17:55:20 +00:00
83b5e1ec48
fix
...
Before we where using `module.op().writeBytecode(writer)` to compute the
hash of a model
but it crashes on some inputs, notably for unused variables.
So I used the text representation of the mlir.
2024-01-05 16:44:41 +00:00
acc492454f
Add operator name to source locations and introduce QoL enhancements: remove bias from sdpa, support shape literals in gatherSlices, add Shape.outer, Tensor.all, and infer argMax dtype.
2024-01-01 15:31:41 +00:00
223857251d
Update MNIST example to use new operator source locations and reflect recent API changes (sdpa bias removal, gatherSlices shape literals, Shape.outer, Tensor.all, and argMax dtype inference)
2023-12-26 10:45:52 +00:00
5bd7f8aae9
zml: HostBuffer.prettyPrint()
...
Add pretty printing of HostBuffer.
This will be leverage by the debug helper `x.print()`
It can also be used like this: `std.log.info("my buffer: {}",
.{host_buffer.pretty()})`
2023-12-25 13:01:17 +00:00
5ddd034d2c
pjrt: Fix profiler by allowing i64 resource IDs and reserving memory when creating array lists.
2023-12-20 17:18:02 +00:00
7ef87236ce
Rewrite simple transpose as reshape in core ZML modules and raise default profiler event limit to 1,000,000.
2023-12-18 13:56:45 +00:00
8a031bd4c8
Update Llama example to use the simplified transpose implementation and increase default profiler size to 1,000,000 events.
2023-12-15 12:06:42 +00:00
145e60b4dd
workspace: Update LLVM, XLA, StableHLO, and PJRT plugins to latest versions.
2023-12-13 10:10:32 +00:00
6a4a7fb9a1
zml/module.zig: Remove unnecessary optional unwrapping.
2023-12-05 12:27:08 +00:00
37725cdaa6
Update PJRT, runtime, and ZML modules to use per‑target output folders and expose profiler.dumpDataAsJson for JSON profiling output.
2023-12-04 10:38:10 +00:00
22a846de72
Update llama example to use per‑target output folders and call profiler.dumpDataAsJson for testing the new compilation layout.
2023-12-01 16:05:59 +00:00
46fbbf43a2
Update tutorial documentation in write_first_model.md with quick fixes.
2023-11-30 12:14:33 +00:00
737f7cbdee
Add example build runner scripts and config for Zig code completion.
2023-11-21 14:55:34 +00:00
ec37c8f731
Update Bazel build files and helper scripts to integrate the custom build runner for ZLS code completion.
2023-11-20 15:29:01 +00:00
6e4fef8844
zml: Introduce arena allocator in CompilationContext. Expose arena allocator to replace existing allocator, enabling safe allocation for ops without misusing std.BoundedArray. Includes breaking changes to chunkAllowTrailing and split. Upgrade axis_ types to anytype for tag handling and add TODOs for upcoming Tensor API.
2023-11-16 15:11:23 +00:00
57bf667c90
Add struct‑based client creation flags to the Zig PJRT API and update context.autoPlatform to accept a flag struct.
2023-11-13 12:45:17 +00:00
cb6fcbbb1a
Update docs and Zig examples to demonstrate the new client creation flags API.
2023-11-09 12:31:11 +00:00
9f4194ad97
Fix test layer. Add tests to detect silent breakage of testLayer and regression in mapAlloc with zero-size struct fields. Add Python venv directory to .gitignore.
2023-11-06 11:25:57 +00:00
237a877a29
zml: Add support for Llama 3.2 text-only models. Implement transpose over embed_tokens as a replacement for missing lm_head and make lm_head optional for compatibility. Add repositories and executions to Bazel and update README.
2023-11-01 10:16:48 +00:00
1c9749c25e
docs: move image in concepts.md
2023-10-31 10:21:14 +00:00
eb20548241
update instructions
...
following, `prepare` doesn't alloc anymore, `ExeWithWeights` is
`ModuleExe`
2023-10-26 13:56:56 +00:00
27c8309424
async: add intrusive queue
...
all code contributed by @steeve
* add intrusive queue
* change the constructor of Channel with default AsyncThread executor
---------
Co-authored-by: Steeve Morin <steeve@zml.ai>
2023-10-24 14:36:22 +00:00