|
|
c109b12e1b
|
Various minor fixes: rewrite tinyllama tokenizer newline token, prevent HostBuffer.isContiguous false trigger on 1‑dim axes, improve HostBuffer.slice1d error messages, simplify module.zig output to show .mlir file path, correct setFlags handling of comptime int/float, make tokenizer.zig return <oob> for out‑of‑range detokenization, and speed up Buffer.constant creation up to 2.5 GB/s on CUDA.
|
2024-02-19 12:34:18 +00:00 |
|
|
|
b8a0aaee5a
|
Update tokenizer to handle byte_fallback for Llama3 GPT2 vocab and add a Llama3‑specific normalizer; adjust tinyllama.zig and hostbuffer.zig to use the new tokenization logic.
|
2024-02-05 15:22:44 +00:00 |
|
|
|
9b7eea8ac2
|
Add stdx utilities and rework async signature inference; tidy executable logging.
|
2023-06-21 14:45:14 +00:00 |
|
|
|
05d23beb23
|
Add Normalizer.fromHfJson to read HuggingFace tokenizer JSON and map to internal options, including a configurable magic space token and a debug flag for token merges. Adjust default handling of extra whitespaces to align with HF defaults.
|
2023-03-29 16:10:29 +00:00 |
|
|
|
ecf52ad724
|
zml.tokenizer: Implement proper byte fallback support by converting hex byte strings (e.g., “<0x40>”) to their characters and splitting unknown UTF‑8 codepoints into bytes, fixing tokenization.
|
2023-02-28 14:40:25 +00:00 |
|
|
|
ebdb8db213
|
zml/tests: re‑enable all Zig tests, fix precision issue by switching to f32, and add refAllDecls to ensure all declarations are tested
|
2023-01-23 16:28:19 +00:00 |
|
|
|
266da6d4be
|
Add initial Bazel build configuration, async runtime implementation, and core MLIR dialect definitions for ZML.
|
2023-01-02 14:28:25 +00:00 |
|