Commit Graph

  • b643f7bc53 Add Bazel build rule and test for Llama3 tokenizer’s byte fallback and unknown token handling. Foke Singh 2024-02-02 10:25:48 +0000
  • 5120fe00dc Update libxev epoll patch to resolve crashes and hangs in epoll and kqueue implementations. Tarry Singh 2024-01-29 17:15:11 +0000
  • edc2ac26f8 Adjust ROCm runtime sandboxing to hook only the PJRT plugin and make hipblastlt bytecodes optional. Tarry Singh 2024-01-26 13:02:23 +0000
  • 0ce36599da Update example build config and Llama demo to support the new async epoll backend and zigcoro scheduler. Foke Singh 2024-01-22 12:17:01 +0000
  • a7b7ae0180 Fix async hangs by reworking the libxev epoll backend and using callBlocking for PJRT plugin loading, improving performance across async and runtime modules. Tarry Singh 2024-01-16 14:13:45 +0000
  • 434cee3a6c Fix CUDA and ROCm sandbox discovery, update epoll libxev patch to prevent high CPU usage, enable XLA GPU latency‑hiding scheduler, and upgrade cuDNN to 9.6.0. Tarry Singh 2024-01-15 09:41:42 +0000
  • 5b8e42f9a9 Vendor zigcoro and unify APIs; rework internals for stdx.meta compatibility, add Channel.try_send/try_recv methods, support dynamically sized channels with comptime capacity, and introduce PoolStackAllocator for coroutine stack allocation. Tarry Singh 2024-01-11 15:40:15 +0000
  • 68dbc290e9 zml: revamp scatterSlices Tarry Singh 2024-01-08 17:55:20 +0000
  • 83b5e1ec48 fix Tarry Singh 2024-01-05 16:44:41 +0000
  • acc492454f Add operator name to source locations and introduce QoL enhancements: remove bias from sdpa, support shape literals in gatherSlices, add Shape.outer, Tensor.all, and infer argMax dtype. Tarry Singh 2024-01-01 15:31:41 +0000
  • 223857251d Update MNIST example to use new operator source locations and reflect recent API changes (sdpa bias removal, gatherSlices shape literals, Shape.outer, Tensor.all, and argMax dtype inference) Foke Singh 2023-12-26 10:45:52 +0000
  • 5bd7f8aae9 zml: HostBuffer.prettyPrint() Tarry Singh 2023-12-25 13:01:17 +0000
  • 5ddd034d2c pjrt: Fix profiler by allowing i64 resource IDs and reserving memory when creating array lists. Tarry Singh 2023-12-20 17:18:02 +0000
  • 7ef87236ce Rewrite simple transpose as reshape in core ZML modules and raise default profiler event limit to 1,000,000. Tarry Singh 2023-12-18 13:56:45 +0000
  • 8a031bd4c8 Update Llama example to use the simplified transpose implementation and increase default profiler size to 1,000,000 events. Foke Singh 2023-12-15 12:06:42 +0000
  • 145e60b4dd workspace: Update LLVM, XLA, StableHLO, and PJRT plugins to latest versions. Tarry Singh 2023-12-13 10:10:32 +0000
  • 6a4a7fb9a1 zml/module.zig: Remove unnecessary optional unwrapping. Tarry Singh 2023-12-05 12:27:08 +0000
  • 37725cdaa6 Update PJRT, runtime, and ZML modules to use per‑target output folders and expose profiler.dumpDataAsJson for JSON profiling output. Tarry Singh 2023-12-04 10:38:10 +0000
  • 22a846de72 Update llama example to use per‑target output folders and call profiler.dumpDataAsJson for testing the new compilation layout. Foke Singh 2023-12-01 16:05:59 +0000
  • 46fbbf43a2 Update tutorial documentation in write_first_model.md with quick fixes. Foke Singh 2023-11-30 12:14:33 +0000
  • 737f7cbdee Add example build runner scripts and config for Zig code completion. Foke Singh 2023-11-21 14:55:34 +0000
  • ec37c8f731 Update Bazel build files and helper scripts to integrate the custom build runner for ZLS code completion. Tarry Singh 2023-11-20 15:29:01 +0000
  • 6e4fef8844 zml: Introduce arena allocator in CompilationContext. Expose arena allocator to replace existing allocator, enabling safe allocation for ops without misusing std.BoundedArray. Includes breaking changes to chunkAllowTrailing and split. Upgrade axis_ types to anytype for tag handling and add TODOs for upcoming Tensor API. Tarry Singh 2023-11-16 15:11:23 +0000
  • 57bf667c90 Add struct‑based client creation flags to the Zig PJRT API and update context.autoPlatform to accept a flag struct. Tarry Singh 2023-11-13 12:45:17 +0000
  • cb6fcbbb1a Update docs and Zig examples to demonstrate the new client creation flags API. Foke Singh 2023-11-09 12:31:11 +0000
  • 9f4194ad97 Fix test layer. Add tests to detect silent breakage of testLayer and regression in mapAlloc with zero-size struct fields. Add Python venv directory to .gitignore. Tarry Singh 2023-11-06 11:25:57 +0000
  • 237a877a29 zml: Add support for Llama 3.2 text-only models. Implement transpose over embed_tokens as a replacement for missing lm_head and make lm_head optional for compatibility. Add repositories and executions to Bazel and update README. Foke Singh 2023-11-01 10:16:48 +0000
  • 1c9749c25e docs: move image in concepts.md Foke Singh 2023-10-31 10:21:14 +0000
  • eb20548241 update instructions Foke Singh 2023-10-26 13:56:56 +0000
  • 27c8309424 async: add intrusive queue Tarry Singh 2023-10-24 14:36:22 +0000
  • 98b512c495 Implement func.call emission and function caching across MLIR dialects and ZML module/ops, propagating tags and donations. Tarry Singh 2023-10-19 17:01:55 +0000
  • 37de7b9613 Add Llama example showcasing the new func.call emission and function caching behavior. Foke Singh 2023-10-17 11:00:37 +0000
  • 7d36913b31 Refactor ZML API: move compile, compileFn and related types to exe.zig, update BaseExe allocation and inline caching in compileInternal, and clean up supporting modules (func.zig, meta.zig, signature.zig, cuda.zig, testing.zig, zml.zig). Tarry Singh 2023-10-13 16:08:08 +0000
  • 35395c13f8 Update example programs (benchmark, llama, mnist, simple_layer) to use the new Exe API and reflect BaseExe allocation changes. Foke Singh 2023-10-10 11:12:34 +0000
  • 3bc6ad98be Update module.zig to donate all buffers except the token_index buffer for the Llama+Neuron example. Tarry Singh 2023-10-06 10:10:56 +0000
  • 474f76cd75 Enable buffer donation in the Llama example, donating all buffers except the token_index buffer. Foke Singh 2023-10-03 16:32:40 +0000
  • 5122ca0203 Refactor rope implementation to compute only required offsets, eliminating full cos/sin matrix generation in module, nn, and tensor code. Tarry Singh 2023-09-27 11:45:33 +0000
  • 06865f5876 Update Llama example to use the new direct rope IR implementation. Foke Singh 2023-09-25 10:22:05 +0000
  • b5c4fb7c58 zml: fix float8 <-> float32 conversions, support for Tensor.constant(.{}, .{ .f8 = 1.0}) Tarry Singh 2023-09-21 11:15:50 +0000
  • 455bb3877f runtimes/cuda: obtain NCCL from the pip package, matching XLA behavior. Tarry Singh 2023-09-20 17:41:44 +0000
  • 0d5389ceda Update CUDA runtime sandboxing and dynamic symbol renaming, switch to pre‑built jax‑cuda‑pjrt plugin, and bump CUDA to 12.6.2 and cuDNN to 9.5.1. Tarry Singh 2023-09-14 13:28:25 +0000
  • 4abdd32f0d Update llama example BUILD to use jax-cuda-pjrt plugin and bump CUDA (12.6.2) / CuDNN (9.5.1) versions. Foke Singh 2023-09-12 15:40:21 +0000
  • c8c99d7d5a zml/pjrtx: prefer the built‑in stablehlo version when a plugin reports a newer version, ensuring artifact serialization uses the correct stablehlo version. Tarry Singh 2023-09-07 17:06:19 +0000
  • 9505992e00 workspace: log diagnostic message before returning NotFound to aid debugging. Tarry Singh 2023-09-04 13:34:37 +0000
  • 937cdec324 examples/loader: add missing stdx dependency. Foke Singh 2023-08-30 13:03:59 +0000
  • aa7fae449e zml/pjrtx: execute bufferFromHostBuffer on the thread pool to avoid blocking and improve weight loading performance. Tarry Singh 2023-08-29 10:28:51 +0000
  • c081cb9ad6 zml/platform: increase maximum device limit to support up to 32 devices per platform. Tarry Singh 2023-08-24 12:23:07 +0000
  • af0630616c Update docs (deploy_on_server, dockerize_models, getting_started) and example Bazel files to include AWS Neuron/Trainium/Inferentia deployment guidance. Foke Singh 2023-08-21 09:15:48 +0000
  • 7d24329d0a Add Bazel build rules and runtime implementation for AWS Neuron/Trainium/Inferentia support. Tarry Singh 2023-08-18 17:11:27 +0000
  • 0709b1b32f zml: reduce memory usage of sdpaMemEfficient by using zml.ops.while instead of zml.ops.for, avoiding concatenation of intermediate results. Tarry Singh 2023-08-14 14:24:11 +0000
  • 022baf782b Update examples/MODULE.bazel to reference the bumped LLVM, XLA, StableHLO, and PJRT plugin versions. Foke Singh 2023-08-11 16:57:15 +0000
  • 01eff33fa0 Update workspace dependencies to newer LLVM, XLA, StableHLO, and PJRT versions and expose new pjrt plugin attribute APIs and stablehlo version APIs in build and runtime configurations. Tarry Singh 2023-08-07 12:28:36 +0000
  • 726a2d0691 Update docs and examples to showcase the new async runtime with coroutines and cross‑thread signaling. Foke Singh 2023-08-03 11:35:24 +0000
  • bcde3962ce Rework async runtime with coroutine support, rename async API (async_→asyncc, await_→awaitt), improve type inference, bump libxev (default epoll) and update related stdx and zml modules. Tarry Singh 2023-08-01 11:35:04 +0000
  • b53462b515 Fix crash in for_ by ensuring values are pushed to their block before opening a new block, adding asserts for block state, and guaranteeing first_step is used. Adjust padding syntax to improve usability. Tarry Singh 2023-07-25 14:25:47 +0000
  • 0fa258cd88 Update examples to reflect recent async module changes, renaming asyncGeneric to asyncc. Foke Singh 2023-07-24 09:34:35 +0000
  • f675a203c2 zml.ops.makeBlock now returns the inner tensor to propagate tags. The function returns both the created mlir.Block and tensors from the supplied function, allowing shape and tag propagation without exposing mlir.Values. Updated tests to run on non‑CPU platforms. Tarry Singh 2023-07-21 09:01:01 +0000
  • be8aa4fa8e Fix several compileError calls introduced by recent changes; ensure Zig compiler catches errors at comptime. Tarry Singh 2023-07-17 09:10:27 +0000
  • 0f9a92f27d module-cache: raise max_pjrt_executable_size limit to 400 MB to accommodate large PJRT executables. Tarry Singh 2023-07-14 17:58:22 +0000
  • 88c7a74ccf third_party/modules/zig-protobuf: revert indentation changes to maintain compatibility with older branches. Tarry Singh 2023-07-13 11:44:53 +0000
  • 4681ce2f24 PJRT: add conversion of profiling protobuf output to JSON format. Tarry Singh 2023-07-05 13:34:05 +0000
  • f7bac1af10 Update example programs (llama and loader) with hotfixes for issue. Foke Singh 2023-07-04 13:40:05 +0000
  • 63aca9f9c2 Hotfixes for build rule, math utilities, module system, and NN implementation (fixes,) Tarry Singh 2023-06-29 10:26:54 +0000
  • 7985716562 Add new Zig example programs (benchmark, llama, loader, mnist, simple_layer) and include a test for the llama example. Foke Singh 2023-06-27 14:23:22 +0000
  • 9b7eea8ac2 Add stdx utilities and rework async signature inference; tidy executable logging. Tarry Singh 2023-06-21 14:45:14 +0000
  • c30aa018dc zml: small cleanup - Add more scatterSlices test cases. - Replace helpers.mapTensors with zml.meta.map. - Fix shape handling when a for loop is fully unrolled. - Allow zml.Tensor.pad to accept i64 for dimension compatibility. - Enable arrays of tensors inside model structs. - Split Buffer.asViewOf into asViewOfHostBuffer and asViewOfDeviceBuffer. Tarry Singh 2023-06-19 15:29:29 +0000
  • f00538667e zml.nn: add dynamic sampling with support for top‑k, top‑p, and min‑p settings. Implements token index computation based on the selected sampling strategy, including options for top_k, max_top_k, top_p, and min_p. Tarry Singh 2023-06-16 14:34:18 +0000
  • b244a18621 zml: set iota default dtype to .i32, with fallback to .i64 for axes with many elements, simplifying usage. Tarry Singh 2023-06-15 12:45:52 +0000
  • 344e07fb6e stablehlo: extend dot_general API to include DotAlgorithm support by merging precision and algorithm attributes into a union, aligning with spec requirements. Currently not exposed to users due to limited algorithm support. Tarry Singh 2023-06-07 11:20:25 +0000
  • 6d720126ac Add PJRT custom call integration with generic zmlHostBufferCallback to copy tensors to host and invoke user callbacks. Introduce Tensor.print() method to output runtime tensor values (CUDA‑specific, uses a pre‑allocated host buffer). Tarry Singh 2023-06-05 13:42:45 +0000
  • bf23eef0d9 examples: clean up inconsistencies in asynk usage across the codebase. Foke Singh 2023-06-01 16:11:58 +0000
  • 499b0d20e5 pjrtx: change behavior to return an error when OpenXLA fails to serialize the new batching_dim attribute for gather/scatter, instead of panicking. Tarry Singh 2023-05-29 17:18:19 +0000
  • 52ef20f981 zml: reintroduce pjrtx to handle reactor blocking issues in async scenarios, particularly with Events. Tarry Singh 2023-05-26 15:54:15 +0000
  • c68ec4bc5c async: implement default threaded backend using a thread pool. Backend selectable via @zml//async:impl flag (threaded or zigcoro). Provides workaround for environments where io_uring is unavailable. Tarry Singh 2023-05-25 16:02:11 +0000
  • 89cf2233d3 zml/aio: enable reading metadata from index.json for sharded safetensor files, allowing metadata storage alongside model config. Tarry Singh 2023-05-23 15:06:59 +0000
  • 2f54e2a5f3 zml.tensor: add triangular operator to zero out the upper‑right matrix region with configurable offset, and toDiagonal (diag_embed) to embed a vector as a diagonal matrix, correcting previous diag naming. Also add ELU activation under zml.nn.Activation. Tarry Singh 2023-05-18 16:39:21 +0000
  • 05faa5021e zml.tensor: add cumulativeSum operator and refactor maxPoolND. Introduce cumulative sum using reduceWindow. Simplify reduceWindow signature by merging padding_shape and padding_value. Update maxPool1D/2D to accept tuple arguments. Revise pad to use tagged or AOS syntax; remove SOA syntax. Tarry Singh 2023-05-17 09:01:27 +0000
  • 54e7eb30b4 Introduce a thin abstraction layer between ZML and PJRT to manage plugin loading decisions, enable compile‑time detection of linked runtimes, and handle cases such as libtpu blocking metadata access. Tarry Singh 2023-05-15 09:36:41 +0000
  • 74e90855ca Configure the runfiles environment globally at context start to ensure Bazel-built binaries locate their runfiles correctly. Tarry Singh 2023-05-12 11:40:23 +0000
  • 57130577e9 Add fallback for runtimes lacking PJRT_Event by using thread‑pool dispatch for buffer copies and treating operations as synchronous when events are absent. Tarry Singh 2023-05-09 12:44:56 +0000
  • 672df8fa2f Update tutorial and example code to use the new asyncc name and Generic slugs. Foke Singh 2023-05-08 16:58:45 +0000
  • 5543c8192f Rename async_ to asyncc and add Generic async slugs in async.zig, aio.zig, and module.zig. Tarry Singh 2023-05-04 14:44:12 +0000
  • cfe38f27ca Switch ROCm dlopen handling to patchelf's rename_dynamic_symbols for more robust dynamic symbol import. Tarry Singh 2023-05-03 17:33:46 +0000
  • fefd84b1bb Replace silu implementation with stablehlo.logistic for higher precision, move logistic logic into sigmoid and alias logistic to sigmoid (breaking change). Tarry Singh 2023-05-01 10:40:50 +0000
  • 021111d07d Extend tests to handle all float types, preventing crashes with bfloat16 tensors. Tarry Singh 2023-04-27 10:34:27 +0000
  • e0fd7f8d97 Fix typographical errors in the documentation. Foke Singh 2023-04-25 16:04:09 +0000
  • 477e13afd0 Add missing zig_cc_binary import to the simple layer example in the documentation. Foke Singh 2023-04-24 10:04:50 +0000
  • ed6444b775 Add Tensor.concatenate support, begin deprecating broadcastLeft, and compute transformer head scaling constant in f32 for higher precision. Tarry Singh 2023-04-21 15:55:07 +0000
  • 11006ca08d Refactor torch module: merge PickleData into Parser as torch.File, rename value file to py_object.zig, use buffered reader for pickle and zip headers, adjust intermediate result handling, simplify Python dict representation, separate kwargs from args, and add extensive tests for long integers, protocol 0, zipped pickle, and a complex PyTorch Conv2d case; also streamline BufferStore initialization. Tarry Singh 2023-04-20 15:43:18 +0000
  • 837f8fb111 Add support for the Llama 3.1 70B Instruct model to facilitate testing on high‑performance accelerators. Foke Singh 2023-04-19 10:23:44 +0000
  • fdb7da5c9b Introduce sharding attributes to Llama weights to enable Tensor Parallelism. Foke Singh 2023-04-13 12:35:27 +0000
  • 833ff5f28d Upgrade PJRT CUDA Plugin to version 0.2.3, adding NCCL support for correct sharding. Tarry Singh 2023-04-12 15:47:06 +0000
  • 8e43a45a3c Add event waiting when invoking a module and improve multi‑device sharding handling. Tarry Singh 2023-04-11 11:32:09 +0000
  • 0189b71070 Rename zml.aio.Value to zml.aio.Metadata, simplify its type variants, and update torch pickle/eval APIs accordingly. Tarry Singh 2023-04-07 16:45:58 +0000
  • aea23c720e Update Llama example to use renamed zml.aio.Metadata (formerly Value) and reflect torch loader changes. Foke Singh 2023-04-05 14:09:59 +0000
  • e25f70d923 Rename and simplify modules in zml/aio/torch: replace redundant qualified names, remove generic utilities, inline code, reorder functions for top‑to‑bottom readability, and extract parsing logic into parseTensor and parseStorage functions. Tarry Singh 2023-04-04 17:20:53 +0000
  • 66881899ca Fix testLayer by removing unnecessary compile_options argument and updating testing logic for new sharded output, ensuring proper usage by llama.zig. Tarry Singh 2023-03-31 14:23:45 +0000
  • 05d23beb23 Add Normalizer.fromHfJson to read HuggingFace tokenizer JSON and map to internal options, including a configurable magic space token and a debug flag for token merges. Adjust default handling of extra whitespaces to align with HF defaults. Tarry Singh 2023-03-29 16:10:29 +0000
  • ef922e3aea Fix empty JSON array handling in safetensor metadata loader and refactor torch loader (make ops slices const and improve readability). Tarry Singh 2023-03-28 16:17:00 +0000
  • aae37738a5 Update loader example to demonstrate handling of empty JSON arrays and improved torch loader readability Foke Singh 2023-03-22 14:52:33 +0000