Commit Graph

63 Commits

Author SHA1 Message Date
b58f7ced3d Fix $ORIGIN handling in runtimes/neuron by escaping $ since zigopts does not expand Make variables. 2025-11-24 12:04:56 +00:00
3d3a0ea463 Enable -fllvm flag for CUDA runtime, upb, and ZML Bazel targets. 2025-11-11 13:02:40 +00:00
cd1b66f615 chore: remove unused target 2025-11-04 12:15:20 +00:00
91f1c3b7aa pjrt/cpu: fix compilation when CPU is absent by ensuring CcCompilationContext is non-empty for Zig modules accessing the c module 2025-10-30 09:01:36 +00:00
bcd43314a4 Migrate BUILD and MODULE files to upstreamed rules_zig (rename copts to zigopts) and adjust ZLS integration accordingly. 2025-10-13 15:26:42 +00:00
7d7c124ada runtimes/cpu: update Darwin arm64 PJRT tarball checksum to match v13.0.0, enabling Bazel fetch on macOS arm64 2025-10-07 10:18:40 +00:00
3ed9bca5ad Remove deprecated writer interface APIs from core ZML modules (async, MLIR, PJRT, runtime, fmt, aio, buffer, exe, hostbuffer, meta, mlirx). 2025-09-04 14:03:09 +00:00
6e15123fb3 Remove obsolete async symbols (asynk, asyncc, awaitt, await_) from core, runtime, and aio modules. 2025-08-29 11:03:59 +00:00
cc969bd532 Add experimental zml.callback API (renamed from custom_call) and fix tensor.print(); update PJRT bindings, host buffer utilities, and related core ZML modules. 2025-08-20 10:27:54 +00:00
01da2184fe xla: bump to commit b3fbfee, temporarily disable libnvptxcompiler due to missing support in PjRT CUDA plugin v13.0, add nvshmem to sandbox for PjRT CUDA plugin 2025-08-12 13:32:18 +00:00
9e3cd6d616 bump runtimes/* code to Zig 0.15.1, restore PyTorch loader using std.fs.File, update CI zig fmt, remove stdx.io, note remaining issues with Neuron and CUDA debug builds 2025-08-07 15:09:27 +00:00
1cf26756a1 workspace: run buildifier, drop rules_uv, refactor tools/hf dependencies 2025-07-16 10:01:41 +00:00
1427286716 runtimes/neuron: fix neuron runtime
This PR fixes the neuron runtime with the following:

Proxy the PJRT Api method to enforce the client struct sizes since the
neuron PJRT plugin doesn't use `>=` but `==` to assert them, breaking
PJRT compatibility guarantees.
Fixes https://github.com/aws-neuron/aws-neuron-sdk/issues/1095

Reimplement `libneuronxla` in Zig to control neuronx-cc sandboxing and
invocation.

Implement a python bootstrapper in Zig to create a full blown
`neuronx-cc` executable, avoiding the infamous chicken and egg problem
of python executables boostrapping when sandboxed (due to fixed path
shebangs).

---------

Co-authored-by: Corentin Kerisit <corentin.kerisit@gmail.com>
2025-07-15 15:26:03 +00:00
e1ee340306 runtimes/cuda: implement zmlxcuda in Zig 2025-07-08 09:25:25 +00:00
c488b634fc runtimes/rocm: implement zmlxrocm in Zig
Also, sandbox `amdgpu.ids` and restore safetensors json parsing.
2025-07-07 16:48:07 +00:00
cf00506dbb Switch workspace build rules from zig_cc_binary to zig_binary, removing the hack and using the C linker directly. 2025-07-03 15:10:36 +00:00
e789e26008 Remove examples workspace and clean up related Bazel BUILD/MODULE files and Zig build scripts. 2025-06-19 09:30:29 +00:00
1a2b862ec2 Add sandbox neuron dependencies: define a trampoline PJRT, create an empty repository for distroless deps, and update Bazel build files and Zig/C sources accordingly. 2025-05-19 17:35:33 +00:00
55c5b540f8 Add XLA 20250718.0‑6319f0d with ROCm 6.4.1 support, update Bazel module files and runtime configs, and apply migration, FFI‑handler and header‑cleanup patches. 2025-05-12 12:10:27 +00:00
ed5ae31338 runtimes/rocm: fetch libdrm from amdgpu repository and add amdgpu.ids layer 2025-04-30 15:53:51 +00:00
e7323be10b runtimes/rocm: switch to in-process LLD, removing the need for sandboxed lld. 2025-04-23 11:43:18 +00:00
7d9fdf94e7 runtimes/rocm: sandbox ROCm dependencies and ensure they load on the main thread due to TLS usage in static C++ destructors. 2025-04-14 16:38:15 +00:00
eba0e72532 runtimes/tpu: sandbox TPU PJRT plugin; no external dependencies. 2025-04-10 14:47:16 +00:00
78d7b672e7 runtimes/cpu: sandbox CPU PJRT plugin, simplifying as there are no additional NEEDED dependencies. 2025-04-03 11:57:46 +00:00
2d321d232d runtimes/cuda: sandbox CUDA dependencies by removing them from the leaf binary, sandboxing the dependency graph, marking dlopen direct dependencies as NEEDED, setting RPATH to the sandbox, loading the PJRT plugin from the sandbox, and enabling weak CUDA symbols without direct linking. 2025-03-26 11:18:29 +00:00
f27a524f31 Update rules_zig: add zig_srcs target, fix source handling bug, clean up BUILD files, adjust async/coro.zig tests, and disable nemo and yaml model loaders. 2025-03-13 12:27:21 +00:00
9488672d4b workspace: bump xla to version 20250710.0-22ea002
Also:
- Bump XLA deps : `com_github_grpc_grpc` and `com_google_protobuf`
- Inject `rules_ml_toolchain`
- Fix `zig_proto_library` rule
2025-03-04 17:12:34 +00:00
fa0ed045ef runtimes/cuda: downgrade cuda and cudnn
This commit reverts part of https://github.com/zml/zml/pull/238/files
This is required because XLA has a strong dependencies on CUDA 12.8 and
upgrading to 12.9 is impossible due to
https://github.com/NVIDIA/cccl/issues/4967
2025-02-28 17:36:12 +00:00
1cafcc3c60 Workspace: bump XLA to newer version. 2025-02-05 17:35:27 +00:00
9ef838be25 Update neuron runtime BUILD.bazel to use Bazel manual tag and S3 cache integration. 2025-02-03 14:03:33 +00:00
95453c7242 Update XLA dependency to version 20250527.0‑cb67f2f and refresh related Bazel BUILD, MODULE, overlay and patch files. 2024-11-22 16:50:20 +00:00
d8a83830e8 runtimes: switch to Cloudflare Debian snapshots for more reliable dependency pinning. 2024-11-15 09:40:58 +00:00
ea3ce685a9 runtimes/neuron: bump runtime version and expose nrt.h header to Zig. 2024-11-14 13:37:47 +00:00
47a4eda5f6 runtimes/cuda: expose cuda.h in the C namespace for CUDA runtimes, enabling custom calls to CUDA functions. 2024-11-01 13:27:24 +00:00
4a0b1cce50 Update Bazel workspace and XLA overlay (MODULE.bazel, BUILD files, patches) to prevent dual LLVM builds and apply migration/bump patches. 2024-09-27 14:00:44 +00:00
63ef78efcc zml: add support for NVTX tracing 2024-08-21 14:41:40 +00:00
ca4e061ad5 Add Bazel build configurations for macOS x86_64 CPU runtime and ZLS third‑party integration. 2024-07-25 15:58:14 +00:00
efcf955a4e workspace, third_party/rules_zig: adjust ZLS to require --version as the first parameter and add missing keys to the BuildConfig object for code completion 2024-07-10 15:20:12 +00:00
967eeb928f Update Bazel workspace and runtime configs: rework sandboxing, bump PJRT to 7.0.0, and upgrade CUDA (12.8), cuDNN (9.8), and ROCm (6.3.4). 2024-06-25 11:00:29 +00:00
3aac788544 Update Bazel build configurations (zig.bzl, BUILD files) for MLIR, PJRT, Neuron, ROCm, tokenizer, and tools, fixing broken dependencies. 2024-05-20 11:28:25 +00:00
f5ab6ff2c6 Update XLA to version 20250204.0-6789523 and adjust Bazel module and runtime files for Bazel 8 compatibility. 2024-05-03 15:57:56 +00:00
5a2171793d workspace: MODULE.bazel cleanup
Title says it all !
2024-04-22 09:27:44 +00:00
980f1b17fb Ensure all runtime plugins have correct SONAME values, fixing issues with prebuilt PJRT plugins. 2024-03-11 10:15:22 +00:00
8a25b1eb74 Revert CUDA PJRT plugin version to 0.4.38 to address performance regression on XLA master. 2024-03-05 17:04:42 +00:00
169a24307c Migrate workspace and XLA module definitions to Bazel 8, updating MODULE.bazel files, BUILD rules, and related migration patches. 2024-02-12 12:43:23 +00:00
7e6103d876 Upgrade XLA to version 20250122.0-cc075be, switch to nvptx compiler and nvlink with nvjitlink support, add warning for CUDA path in LD_LIBRARY_PATH, and revert the previous CUDA sandbox fix. 2024-02-06 09:31:48 +00:00
edc2ac26f8 Adjust ROCm runtime sandboxing to hook only the PJRT plugin and make hipblastlt bytecodes optional. 2024-01-26 13:02:23 +00:00
a7b7ae0180 Fix async hangs by reworking the libxev epoll backend and using callBlocking for PJRT plugin loading, improving performance across async and runtime modules. 2024-01-16 14:13:45 +00:00
434cee3a6c Fix CUDA and ROCm sandbox discovery, update epoll libxev patch to prevent high CPU usage, enable XLA GPU latency‑hiding scheduler, and upgrade cuDNN to 9.6.0. 2024-01-15 09:41:42 +00:00
145e60b4dd workspace: Update LLVM, XLA, StableHLO, and PJRT plugins to latest versions. 2023-12-13 10:10:32 +00:00